Project

General

Profile

Actions

Bug #3147

closed

[SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips.

Added by Peter Amstutz almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
1.0

Description

Desired behavior:
  • Transactions that time out or produce 5xx errors should be reattempted after a delay
  • Transactions that produce 4xx errors should not be reattempted

Background/example:

Keep services were restarted while doing an upload, which resulted in them being temporarily unavailable. Arv-put fails in this case (and crashes with an exception) instead of retrying for a bit.

Exception in thread Thread-72:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 213, in run
    body=self.args['data'])
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1300, in _conn_request
    conn.connect()
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 913, in connect
    raise socket.error, msg
error: [Errno 111] Connection refused

Traceback (most recent call last):
  File "/usr/local/bin/arv-put", line 4, in <module>
    main()
  File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 376, in main
    path, max_manifest_depth=args.max_manifest_depth)
  File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 292, in write_directory_tree
    path, stream_name, max_manifest_depth)
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 270, in write_directory_tree
    self.do_queued_work()
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 197, in do_queued_work
    self._work_file()
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 210, in _work_file
    self.write(buf)
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 494, in write
    return super(ResumableCollectionWriter, self).write(data)
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 281, in write
    self.flush_data()
  File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 268, in flush_data
    super(ArvPutCollectionWriter, self).flush_data()
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 286, in flush_data
    self._current_stream_locators += [Keep.put(data_buffer[0:self.KEEP_BLOCK_SIZE])]
  File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 119, in put
    return Keep.global_client_object().put(data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 485, in put
    (data_hash, want_copies, have_copies))
arvados.errors.KeepWriteError: Write fail for 6afcd3c55f8c02043815464f33e4d52a: wanted 2 but wrote 1

Subtasks 6 (0 open6 closed)

Task #3735: Make crunch-job retry when it calls arv-putResolvedBrett Smith08/27/2014Actions
Task #3802: arv-mount exposes retry supportResolvedBrett Smith09/03/2014Actions
Task #3658: PySDK Collection classes expose retry supportResolvedBrett Smith08/22/2014Actions
Task #3659: arv-put exposes retry supportResolvedBrett Smith08/22/2014Actions
Task #3657: Review 3147-pysdk-retries-wipResolvedTim Pierce08/22/2014Actions
Task #3872: Review 3147-py-collection-retries-wip2ResolvedTim Pierce08/22/2014Actions

Related issues

Related to Arvados - Bug #3351: [SDK] arv-put hangs during remote uploadResolvedTom Clegg07/28/2014Actions
Related to Arvados - Bug #3419: [SDKs] Perl client library should retry failed API requests after errors like Gateway TimeoutClosed09/23/2014Actions
Related to Arvados - Idea #3795: [Crunch/SDKs] Tasks need more retry supportClosed09/03/2014Actions
Related to Arvados - Bug #12684: Let user specify a retry strategy on the client object, used for all API callsResolvedBrett Smith05/09/2023Actions
Actions

Also available in: Atom PDF