Project

General

Profile

Actions

Bug #12404

closed

Parallel a-c-r runs interfere in Docker uploads

Added by Peter Amstutz over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

Copied from https://dev.arvados.org/issues/12355#note-9

If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:

2017-10-03 23:13:58 arvados.arv_put INFO: Resuming upload from cache file /root/.cache/arvados/arv-put/c5dadc18a2dc00619c0a24e33ed5e703
2017-10-03 23:13:58 arvados.arv_put ERROR: arv-put: Another process is already uploading this data.
         Use --no-cache if this is really what you want.
2017-10-03 23:13:58 cwltool ERROR: Workflow error, try again with --debug for more information:
v1.0/cat3-tool.cwl:7:5: keepdocker exited with code 1

The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.

Need to isolate the arv-keepdocker calls so they either share the work (because they are trying to do the same thing) or at least don't interfere with each other.


Related issues

Related to Arvados - Bug #12355: run-arvados-cwl-conformance-tests really slowResolvedWard VandewegeActions
Actions #1

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege over 6 years ago

This would also greatly speed up the CWL test suite that we run on 4xphq, c97qk and 9tee4.

Actions #3

Updated by Peter Amstutz almost 6 years ago

  • Status changed from New to Resolved

This has been fixed with a shared file lock as part of the multithreaded submission work in #13108

Actions

Also available in: Atom PDF