Bug #12404

Parallel a-c-r runs interfere in Docker uploads

Added by Peter Amstutz 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Copied from https://dev.arvados.org/issues/12355#note-9

If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:

2017-10-03 23:13:58 arvados.arv_put INFO: Resuming upload from cache file /root/.cache/arvados/arv-put/c5dadc18a2dc00619c0a24e33ed5e703
2017-10-03 23:13:58 arvados.arv_put ERROR: arv-put: Another process is already uploading this data.
         Use --no-cache if this is really what you want.
2017-10-03 23:13:58 cwltool ERROR: Workflow error, try again with --debug for more information:
v1.0/cat3-tool.cwl:7:5: keepdocker exited with code 1

The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.

Need to isolate the arv-keepdocker calls so they either share the work (because they are trying to do the same thing) or at least don't interfere with each other.


Related issues

Related to Arvados - Bug #12355: run-arvados-cwl-conformance-tests really slowResolved

History

#1 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

#2 Updated by Ward Vandewege 2 months ago

This would also greatly speed up the CWL test suite that we run on 4xphq, c97qk and 9tee4.

Also available in: Atom PDF