Bug #12404

Parallel a-c-r runs interfere in Docker uploads

Added by Peter Amstutz 12 months ago. Updated 3 months ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


Copied from https://dev.arvados.org/issues/12355#note-9

If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:

2017-10-03 23:13:58 arvados.arv_put INFO: Resuming upload from cache file /root/.cache/arvados/arv-put/c5dadc18a2dc00619c0a24e33ed5e703
2017-10-03 23:13:58 arvados.arv_put ERROR: arv-put: Another process is already uploading this data.
         Use --no-cache if this is really what you want.
2017-10-03 23:13:58 cwltool ERROR: Workflow error, try again with --debug for more information:
v1.0/cat3-tool.cwl:7:5: keepdocker exited with code 1

The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.

Need to isolate the arv-keepdocker calls so they either share the work (because they are trying to do the same thing) or at least don't interfere with each other.

Related issues

Related to Arvados - Bug #12355: run-arvados-cwl-conformance-tests really slowResolved


#1 Updated by Peter Amstutz 12 months ago

  • Description updated (diff)

#2 Updated by Ward Vandewege 12 months ago

This would also greatly speed up the CWL test suite that we run on 4xphq, c97qk and 9tee4.

#3 Updated by Peter Amstutz 3 months ago

  • Status changed from New to Resolved

This has been fixed with a shared file lock as part of the multithreaded submission work in #13108

Also available in: Atom PDF