Project

General

Profile

Feature #19972

Updated by Peter Amstutz almost 2 years ago

arvados.Client currently does not perform any retry behavior when a request fails with a 5xx error. 

 5xx errors should be retried with a random backoff time.    Possible behavior (Fibonacci sequence): behavior: 

 1st attempt    wait 0-1 seconds 
 2nd attempt, wait 1-2 seconds 
 3nd attempt, wait 2-3 seconds 
 4nd attempt, wait 3-5 seconds 
 5th attempt, wait 5-8 seconds 

 Another approach is randomized exponential backoff.    As it happens, this is what the Google Python client does: 

 1st attempt    wait 0-1 seconds 
 2nd attempt, wait 0-2 seconds 
 3nd attempt, wait 0-4 seconds 
 4nd attempt, wait 0-8 seconds 
 5th attempt, wait 0-16 seconds 

 (in this case, it probably makes sense to set a ceiling in terms of amount of time spent retrying rather than the number of retries). 

 The idea being that if the API server is overloaded, the clients should all select    different retry intervals so that they don't just flood the API server by all retrying at a fixed time. 

 The caller should be able to control the retry behavior, e.g., 
 * @client.RetryWindow = time.Minute@ means return an error if the request is still failing after 1 minute of retrying on the prevailing schedule 
 * @client.RetryWindow = 0@ means don't retry automatically 

 A request that uses a caller-provided context should also make sure to notice and return early when the context cancels during a retry delay. 

Back