[Python SDK] Retry on HTTP 5xx errors
This sounds like what #3147 was intended to address, but it's apparently not working:
Traceback (most recent call last): File "./myg_runs.py", line 244, in <module> main() File "./myg_runs.py", line 230, in main dump_subprojects(stats, project, SKIP_PROJECTS) File "./myg_runs.py", line 210, in dump_subprojects dump_pipeline_instances(stats, sp) File "./myg_runs.py", line 182, in dump_pipeline_instances time = dump_pipeline_instance(stats, i) File "./myg_runs.py", line 167, in dump_pipeline_instance dump_jobs(batchid, sample, cwl_runner['job']['components']) File "./myg_runs.py", line 84, in dump_jobs jobs = api.jobs().list(filters=[['uuid','=',job_uuid]]).execute() File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper return wrapped(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/googleapiclient/http.py", line 840, in execute raise HttpError(resp, content, uri=self.uri) arvados.errors.ApiError: <HttpError 502 when requesting https://e51c5.arvadosapi.com/arvados/v1/jobs?alt=json&filters=%5B%5B%22uuid%22%2C+%22%3D%22%2C+%22e51c5-8i9sb-b8od8nvombxq3h3%22%5D%5D returned "Bad Gateway">
#2 Updated by Lucas Di Pentima about 3 years ago
It seems that the api client object already has a default retry value of 2 (https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/api.py#L33), and the retry code may be missing some exception catching:
#5 Updated by Tom Morris almost 2 years ago
The current SDK defaults are 2 retries with an initial sleep period of 2 seconds and a multiplier of 2 which translates to 3 attempts over 6 seconds (at 0, 2, 4 seconds).
Although it doesn't look like we're using it, the Google API client library has retry support built in:
but their algorithm is different due to the use of randomization and a fixed base period and multiplier
sleep_time = rand() * 2 ** retry_num
The only indication as to whether retries were attempted is a debug level logging message, so I suggest we upgrade that to warning level, like the Google API client library does. Without that there's no way to tell whether the exception came on the final attempt and wasn't intended to be caught or whether it's a retryable exception that's not being caught for some reason while the retries are still in process.