Project

General

Profile

Actions

Bug #12684

open

Let user specify a retry strategy on the client object, used for all API calls

Added by Tom Morris over 5 years ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Updated Feb 23, 2023

The SDK either doesn't retry at all, or doesn't retry enough. Requiring end users to manually provide num_retries on every call is not a (human) scalable solution.

  • The number of retries should be settable when creating the API object
  • The default retry count should be much more robust -- like 8 retries

Old ticket

This sounds like what #3147 was intended to address, but it's apparently not working:

Traceback (most recent call last):
  File "./myg_runs.py", line 244, in <module>
    main()
  File "./myg_runs.py", line 230, in main
    dump_subprojects(stats, project, SKIP_PROJECTS)
  File "./myg_runs.py", line 210, in dump_subprojects
    dump_pipeline_instances(stats, sp)
  File "./myg_runs.py", line 182, in dump_pipeline_instances
    time = dump_pipeline_instance(stats, i)
  File "./myg_runs.py", line 167, in dump_pipeline_instance
    dump_jobs(batchid, sample, cwl_runner['job']['components'])
  File "./myg_runs.py", line 84, in dump_jobs
    jobs = api.jobs().list(filters=[['uuid','=',job_uuid]]).execute()
  File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/googleapiclient/http.py", line 840, in execute
    raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 502 when requesting https://e51c5.arvadosapi.com/arvados/v1/jobs?alt=json&filters=%5B%5B%22uuid%22%2C+%22%3D%22%2C+%22e51c5-8i9sb-b8od8nvombxq3h3%22%5D%5D returned "Bad Gateway">

Related issues

Related to Arvados - Bug #3147: [SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips.ResolvedBrett Smith08/22/2014

Actions
Related to Arvados - Story #20107: Research retry strategies when SDK API calls return 5xx errorsNewBrett Smith

Actions
Actions #1

Updated by Tom Morris over 5 years ago

  • Related to Bug #3147: [SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips. added
Actions #2

Updated by Lucas Di Pentima over 5 years ago

It seems that the api client object already has a default retry value of 2 (https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L33), and the retry code may be missing some exception catching:

https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L69-L101

Actions #3

Updated by Peter Amstutz over 5 years ago

  • Tracker changed from Feature to Bug
Actions #5

Updated by Tom Morris about 4 years ago

The current SDK defaults are 2 retries with an initial sleep period of 2 seconds and a multiplier of 2 which translates to 3 attempts over 6 seconds (at 0, 2, 4 seconds).

Although it doesn't look like we're using it, the Google API client library has retry support built in:

https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-module.html#_retry_request
https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-pysrc.html#HttpRequest.execute

but their algorithm is different due to the use of randomization and a fixed base period and multiplier

       sleep_time = rand() * 2 ** retry_num 

The only indication as to whether retries were attempted is a debug level logging message, so I suggest we upgrade that to warning level, like the Google API client library does. Without that there's no way to tell whether the exception came on the final attempt and wasn't intended to be caught or whether it's a retryable exception that's not being caught for some reason while the retries are still in process.

Actions #6

Updated by Peter Amstutz over 1 year ago

  • Target version deleted (To Be Groomed)
Actions #7

Updated by Peter Amstutz about 2 months ago

  • Release set to 60
Actions #8

Updated by Peter Amstutz about 1 month ago

  • Target version set to To be groomed
Actions #9

Updated by Peter Amstutz about 1 month ago

  • Release deleted (60)
  • Assigned To set to Brett Smith
Actions #10

Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)
Actions #11

Updated by Brett Smith about 1 month ago

  • Category set to SDKs
  • Subject changed from [Python SDK] Retry on HTTP 5xx errors to Let user specify a retry strategy on the client object, used for all API calls
Actions #12

Updated by Brett Smith about 1 month ago

This is related to #20107, it would be really nice for us as a team to get on the same page about what our "retry philosophy" is, and then aim to implement that. It would be especially nice if all our SDKs implemented the same strategy.

Actions #13

Updated by Brett Smith about 1 month ago

  • Related to Story #20107: Research retry strategies when SDK API calls return 5xx errors added
Actions

Also available in: Atom PDF