Project

General

Profile

Actions

Feature #19972

closed

Go arvados.Client retry with backoff

Added by Peter Amstutz over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Story points:
2.0
Release relationship:
Auto

Description

arvados.Client currently does not perform any retry behavior when a request fails with a 5xx error.

5xx errors should be retried with a random backoff time. Possible behavior (Fibonacci sequence):

1st attempt wait 0-1 seconds
2nd attempt, wait 1-2 seconds
3nd attempt, wait 2-3 seconds
4nd attempt, wait 3-5 seconds
5th attempt, wait 5-8 seconds

Another approach is randomized exponential backoff. As it happens, this is what the Google Python client does:

1st attempt wait 0-1 seconds
2nd attempt, wait 0-2 seconds
3nd attempt, wait 0-4 seconds
4nd attempt, wait 0-8 seconds
5th attempt, wait 0-16 seconds

(in this case, it probably makes sense to set a ceiling in terms of amount of time spent retrying rather than the number of retries).

The idea being that if the API server is overloaded, the clients should all select different retry intervals so that they don't just flood the API server by all retrying at a fixed time.

The caller should be able to control the retry behavior, e.g.,
  • client.RetryWindow = time.Minute means return an error if the request is still failing after 1 minute of retrying on the prevailing schedule
  • client.RetryWindow = 0 means don't retry automatically

A request that uses a caller-provided context should also make sure to notice and return early when the context cancels during a retry delay.


Subtasks 1 (0 open1 closed)

Task #20044: Review 19972-go-client-retryResolvedTom Clegg03/08/2023Actions

Related issues

Related to Arvados - Bug #19973: Dispatcher responds to 503 errors by limiting container concurrencyResolvedTom Clegg02/16/2023Actions
Related to Arvados - Feature #19984: Go arvados.Client responds to 503 errors by limiting outgoing connection concurrencyResolvedTom Clegg02/21/2023Actions
Related to Arvados - Idea #20107: Research retry strategies when SDK API calls return 5xx errorsNewBrett SmithActions
Related to Arvados - Bug #21023: Go keepclient retry does not wait between retriesResolvedTom CleggActions
Actions

Also available in: Atom PDF