Bug #3147

Updated by Tom Clegg about 5 years ago

Client libraries to address:
* Perl (arguably most important, because crunch-job uses it)
* Python (second most important because most crunch scripts use it)
* Ruby
* arv-run-pipeline-instance (assuming it's still not using the Ruby SDK)
* Workbench (assuming it's still not using the Ruby SDK)
* arv (assuming it's still not using the Ruby SDK)
* Java
* Go

Desired behavior:
* Transactions that time out or produce 5xx errors should be reattempted after a delay
* Transactions that produce 4xx errors should not be reattempted

Background/example:

Keep services were restarted while doing an upload, which resulted in them being temporarily unavailable. Arv-put fails in this case (and crashes with an exception) instead of retrying for a bit.

<pre>
Exception in thread Thread-72:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 213, in run
body=self.args['data'])
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1300, in _conn_request
conn.connect()
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 913, in connect
raise socket.error, msg
error: [Errno 111] Connection refused

Traceback (most recent call last):
File "/usr/local/bin/arv-put", line 4, in <module>
main()
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 376, in main
path, max_manifest_depth=args.max_manifest_depth)
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 292, in write_directory_tree
path, stream_name, max_manifest_depth)
File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 270, in write_directory_tree
self.do_queued_work()
File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 197, in do_queued_work
self._work_file()
File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 210, in _work_file
self.write(buf)
File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 494, in write
return super(ResumableCollectionWriter, self).write(data)
File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 281, in write
self.flush_data()
File "/usr/local/lib/python2.7/dist-packages/arvados/commands/put.py", line 268, in flush_data
super(ArvPutCollectionWriter, self).flush_data()
File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 286, in flush_data
self._current_stream_locators += [Keep.put(data_buffer[0:self.KEEP_BLOCK_SIZE])]
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 119, in put
return Keep.global_client_object().put(data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 485, in put
(data_hash, want_copies, have_copies))
arvados.errors.KeepWriteError: Write fail for 6afcd3c55f8c02043815464f33e4d52a: wanted 2 but wrote 1
</pre>

Back