Actions
Bug #3147
closed[SDKs] Python clients should automatically retry failed API and Keep requests (including timeouts), in order to survive temporary outages like server restarts and network blips.
Story points:
1.0
Description
Desired behavior:
- Transactions that time out or produce 5xx errors should be reattempted after a delay
- Transactions that produce 4xx errors should not be reattempted
Background/example:
Keep services were restarted while doing an upload, which resulted in them being temporarily unavailable. Arv-put fails in this case (and crashes with an exception) instead of retrying for a bit.
Exception in thread Thread-72: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner self.run() File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 213, in run body=self.args['data']) File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1300, in _conn_request conn.connect() File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 913, in connect raise socket.error, msg error: [Errno 111] Connection refused Traceback (most recent call last): File "/usr/local/bin/arv-put", line 4, in <module> main() File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 376, in main path, max_manifest_depth=args.max_manifest_depth) File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 292, in write_directory_tree path, stream_name, max_manifest_depth) File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 270, in write_directory_tree self.do_queued_work() File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 197, in do_queued_work self._work_file() File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 210, in _work_file self.write(buf) File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 494, in write return super(ResumableCollectionWriter, self).write(data) File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 281, in write self.flush_data() File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 268, in flush_data super(ArvPutCollectionWriter, self).flush_data() File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 286, in flush_data self._current_stream_locators += [Keep.put(data_buffer[0:self.KEEP_BLOCK_SIZE])] File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 119, in put return Keep.global_client_object().put(data, **kwargs) File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 485, in put (data_hash, want_copies, have_copies)) arvados.errors.KeepWriteError: Write fail for 6afcd3c55f8c02043815464f33e4d52a: wanted 2 but wrote 1
Actions