Bug #12404

Parallel a-c-r runs interfere in Docker uploads

Added by Peter Amstutz 14 days ago. Updated 8 days ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Story points-
Velocity based estimate-

Description

Copied from https://dev.arvados.org/issues/12355#note-9

If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:

2017-10-03 23:13:58 arvados.arv_put INFO: Resuming upload from cache file /root/.cache/arvados/arv-put/c5dadc18a2dc00619c0a24e33ed5e703
2017-10-03 23:13:58 arvados.arv_put ERROR: arv-put: Another process is already uploading this data.
         Use --no-cache if this is really what you want.
2017-10-03 23:13:58 cwltool ERROR: Workflow error, try again with --debug for more information:
v1.0/cat3-tool.cwl:7:5: keepdocker exited with code 1

The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.

Need to isolate the arv-keepdocker calls so they either share the work (because they are trying to do the same thing) or at least don't interfere with each other.


Related issues

Related to Arvados - Bug #12355: run-arvados-cwl-conformance-tests really slow Resolved

History

#1 Updated by Peter Amstutz 14 days ago

  • Description updated (diff)

#2 Updated by Ward Vandewege 8 days ago

This would also greatly speed up the CWL test suite that we run on 4xphq, c97qk and 9tee4.

Also available in: Atom PDF