Feature #19972
closedGo arvados.Client retry with backoff
Description
arvados.Client currently does not perform any retry behavior when a request fails with a 5xx error.
5xx errors should be retried with a random backoff time. Possible behavior (Fibonacci sequence):
1st attempt wait 0-1 seconds
2nd attempt, wait 1-2 seconds
3nd attempt, wait 2-3 seconds
4nd attempt, wait 3-5 seconds
5th attempt, wait 5-8 seconds
Another approach is randomized exponential backoff. As it happens, this is what the Google Python client does:
1st attempt wait 0-1 seconds
2nd attempt, wait 0-2 seconds
3nd attempt, wait 0-4 seconds
4nd attempt, wait 0-8 seconds
5th attempt, wait 0-16 seconds
(in this case, it probably makes sense to set a ceiling in terms of amount of time spent retrying rather than the number of retries).
The idea being that if the API server is overloaded, the clients should all select different retry intervals so that they don't just flood the API server by all retrying at a fixed time.
The caller should be able to control the retry behavior, e.g.,client.RetryWindow = time.Minute
means return an error if the request is still failing after 1 minute of retrying on the prevailing scheduleclient.RetryWindow = 0
means don't retry automatically
A request that uses a caller-provided context should also make sure to notice and return early when the context cancels during a retry delay.