Bug #12684

[Python SDK] Retry on HTTP 5xx errors

Added by Tom Morris over 1 year ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

This sounds like what #3147 was intended to address, but it's apparently not working:

Traceback (most recent call last):
  File "./myg_runs.py", line 244, in <module>
    main()
  File "./myg_runs.py", line 230, in main
    dump_subprojects(stats, project, SKIP_PROJECTS)
  File "./myg_runs.py", line 210, in dump_subprojects
    dump_pipeline_instances(stats, sp)
  File "./myg_runs.py", line 182, in dump_pipeline_instances
    time = dump_pipeline_instance(stats, i)
  File "./myg_runs.py", line 167, in dump_pipeline_instance
    dump_jobs(batchid, sample, cwl_runner['job']['components'])
  File "./myg_runs.py", line 84, in dump_jobs
    jobs = api.jobs().list(filters=[['uuid','=',job_uuid]]).execute()
  File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/googleapiclient/http.py", line 840, in execute
    raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 502 when requesting https://e51c5.arvadosapi.com/arvados/v1/jobs?alt=json&filters=%5B%5B%22uuid%22%2C+%22%3D%22%2C+%22e51c5-8i9sb-b8od8nvombxq3h3%22%5D%5D returned "Bad Gateway">

Related issues

Related to Arvados - Bug #3147: [SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips.Resolved2014-08-22

History

#1 Updated by Tom Morris over 1 year ago

  • Related to Bug #3147: [SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips. added

#2 Updated by Lucas Di Pentima over 1 year ago

It seems that the api client object already has a default retry value of 2 (https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L33), and the retry code may be missing some exception catching:

https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L69-L101

#3 Updated by Peter Amstutz over 1 year ago

  • Tracker changed from Feature to Bug

#5 Updated by Tom Morris about 1 month ago

The current SDK defaults are 2 retries with an initial sleep period of 2 seconds and a multiplier of 2 which translates to 3 attempts over 6 seconds (at 0, 2, 4 seconds).

Although it doesn't look like we're using it, the Google API client library has retry support built in:

https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-module.html#_retry_request
https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.http-pysrc.html#HttpRequest.execute

but their algorithm is different due to the use of randomization and a fixed base period and multiplier

       sleep_time = rand() * 2 ** retry_num 

The only indication as to whether retries were attempted is a debug level logging message, so I suggest we upgrade that to warning level, like the Google API client library does. Without that there's no way to tell whether the exception came on the final attempt and wasn't intended to be caught or whether it's a retryable exception that's not being caught for some reason while the retries are still in process.

Also available in: Atom PDF