Bug #14770
closedRegistered workflows fail to run due to spurious --collection-keep-cache=256
Description
Registering a workflow using arvados-cwl-runner --create-workflow and then attempting to run it results in:
arvados-cwl-runner: error: unrecognized arguments: --collection-keep-cache=256
I can't find any indication that arvados-cwl-runner ever supported such a command switch. The only mention in the code base appears to be in `arvados.apps/workbench/app/controllers/work_units_controller.rb` introduced in eb58fd94 on 2018-11-27:
attrs['command'] = ["arvados-cwl-runner", "--local", "--api=containers", "--project-uuid=#{params['work_unit']['owner_uuid']}", "--collection-keep-cache=#{keep_cache}", "/var/lib/cwl/workflow.json#main", "/var/lib/cwl/cwl.input.json"]
This works in 1.2.0 (3c85ce36c1188474b6b94ad534daf51cb1497274)
but fails with 1.3.0 (fdf6033451b12862a8305f4e3a4b06fbbdd498e1)
Updated by Tom Morris almost 6 years ago
- Target version set to 2019-02-13 Sprint
Updated by Lucas Di Pentima almost 6 years ago
- Status changed from New to In Progress
- Assigned To set to Lucas Di Pentima
Updated by Lucas Di Pentima almost 6 years ago
It seems that the correct param should be --collection-cache-size
Updated by Tom Morris almost 6 years ago
- Target version changed from 2019-02-13 Sprint to 2019-01-30 Sprint
Pulling onto this sprint since Lucas has already started working on it.
Updated by Lucas Di Pentima almost 6 years ago
Update at d162b10b0 - branch 14770-wb-acr-bad-param
Test run: https://ci.curoverse.com/job/developer-run-tests/1039/
Fixed the param passed to ACR from workbench.
Updated by Lucas Di Pentima almost 6 years ago
- Target version changed from 2019-01-30 Sprint to 2019-02-13 Sprint
Updated by Lucas Di Pentima almost 6 years ago
I did not because it was pretty clear to me what is the problem.
Nevertheless I've just tried it on arvbox and it failed.
Used the example at https://www.commonwl.org/user_guide/02-1st-example/, submitted the workflow executing:
arvados-cwl-runner --create-workflow --api containers 1st-tool.cwl echo-job.yml
...then went to workbench, and tried to run the created workflow and got the same failure. Here's the entire log:
2019-01-30T17:57:59.005019513Z crunch-run dev started 2019-01-30T17:57:59.005053232Z Executing container '1q71a-dz642-8bepebacm6m21dw' 2019-01-30T17:57:59.005093889Z Executing on host 'e72df3831fbd' 2019-01-30T17:57:59.117924767Z Fetching Docker image from collection '99dc77297fbd39babf7587efd0d71f36+345' 2019-01-30T17:57:59.147511628Z Using Docker image id 'sha256:ee4130288b739498d319ac83dd62a714296199f6b7d4383354161fb6336e6a46' 2019-01-30T17:57:59.278644642Z Docker image is available 2019-01-30T17:57:59.299341161Z Running [arv-mount --foreground --allow-other --read-write --crunchstat-interval=10 --file-cache 268435456 --mount-tmp tmp0 --mount-by-pdh by_id /tmp/crunch-run.1q71a-dz642-8bepebacm6m21dw.040935147/keep976348494] 2019-01-30T17:57:59.118078358Z notice: reading stats from /sys/fs/cgroup/cpuacct/cgroup.procs 2019-01-30T17:57:59.118210109Z notice: reading stats from /sys/fs/cgroup/memory/memory.stat 2019-01-30T17:57:59.118458028Z mem 274558976 cache 21878 pgmajfault 3125178368 rss 2019-01-30T17:57:59.118494020Z notice: reading stats from /sys/fs/cgroup/cpuacct/cpuacct.stat 2019-01-30T17:57:59.118580221Z notice: reading stats from /sys/fs/cgroup/cpuset/cpuset.cpus 2019-01-30T17:57:59.118601306Z cpu 1736.5700 user 605.6800 sys 4 cpus 2019-01-30T17:57:59.118673946Z notice: reading stats from /sys/fs/cgroup/blkio/blkio.io_service_bytes 2019-01-30T17:57:59.118722647Z blkio:8:0 122073088 write 3577901056 read 2019-01-30T17:57:59.119331685Z net:eth0 3212642 tx 193065701 rx 2019-01-30T17:57:59.119337804Z net:docker0 0 tx 0 rx Host Information Linux e72df3831fbd 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux CPU Information processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz stepping : 9 cpu MHz : 2807.998 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase avx2 invpcid rdseed clflushopt flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5615.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz stepping : 9 cpu MHz : 2807.998 cache size : 6144 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase avx2 invpcid rdseed clflushopt flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5615.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz stepping : 9 cpu MHz : 2807.998 cache size : 6144 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase avx2 invpcid rdseed clflushopt flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5615.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz stepping : 9 cpu MHz : 2807.998 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase avx2 invpcid rdseed clflushopt flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 5615.99 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: Memory Information MemTotal: 4050520 kB MemFree: 104796 kB MemAvailable: 348848 kB Buffers: 41316 kB Cached: 430836 kB SwapCached: 33864 kB Active: 2759944 kB Inactive: 968636 kB Active(anon): 2646300 kB Inactive(anon): 663692 kB Active(file): 113644 kB Inactive(file): 304944 kB Unevictable: 128 kB Mlocked: 128 kB SwapTotal: 1400828 kB SwapFree: 439240 kB Dirty: 2696 kB Writeback: 0 kB AnonPages: 3231508 kB Mapped: 241320 kB Shmem: 53260 kB Slab: 108112 kB SReclaimable: 57160 kB SUnreclaim: 50952 kB KernelStack: 13088 kB PageTables: 50396 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 3426088 kB Committed_AS: 8244960 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 167872 kB DirectMap2M: 4026368 kB Disk Space Filesystem 1M-blocks Used Available Use% Mounted on none 58758 29852 25900 54% / none 58758 29852 25900 54% / Disk INodes Filesystem Inodes IUsed IFree IUse% Mounted on none 3826368 1361133 2465235 36% / none 3826368 1361133 2465235 36% / { "auth_uuid": "1q71a-gj3su-ov79kuuxw2nvqqz", "command": [ "arvados-cwl-runner", "--local", "--api=containers", "--project-uuid=1q71a-tpzed-v0o0a46htvwy37d", "--collection-keep-cache=256", "/var/lib/cwl/workflow.json#main", "/var/lib/cwl/cwl.input.json" ], "container_image": "99dc77297fbd39babf7587efd0d71f36+345", "created_at": "2019-01-30T17:57:56.869829000Z", "cwd": "/var/spool/cwl", "environment": {}, "etag": "4d71pdc11hvfj5jq490qa85v1", "exit_code": null, "finished_at": null, "href": "/containers/1q71a-dz642-8bepebacm6m21dw", "kind": "arvados#container", "locked_by_uuid": "1q71a-gj3su-1c3q91f062k68fi", "log": null, "modified_at": "2019-01-30T17:57:57.863375000Z", "modified_by_client_uuid": "1q71a-ozdt8-tz3hl42ox5ksyv7", "modified_by_user_uuid": "1q71a-tpzed-000000000000000", "mounts": { "/var/lib/cwl/cwl.input.json": { "content": { "message": "Hello world!" }, "kind": "json" }, "/var/lib/cwl/workflow.json": { "content": { "$graph": [ { "baseCommand": "echo", "class": "CommandLineTool", "cwlVersion": "v1.0", "id": "#main", "inputs": [ { "default": "Hello world!", "id": "#main/message", "inputBinding": { "position": 1 }, "type": "string" } ], "outputs": [] } ], "cwlVersion": "v1.0" }, "kind": "json" }, "/var/spool/cwl": { "kind": "collection", "writable": true }, "stdout": { "kind": "file", "path": "/var/spool/cwl/cwl.output.json" } }, "output": null, "output_path": "/var/spool/cwl", "owner_uuid": "1q71a-tpzed-000000000000000", "priority": 1124351035770240, "progress": null, "runtime_auth_scopes": [ "all" ], "runtime_constraints": { "API": true, "keep_cache_ram": 268435456, "ram": 1073741824, "vcpus": 1 }, "runtime_status": {}, "runtime_user_uuid": "1q71a-tpzed-v0o0a46htvwy37d", "scheduling_parameters": {}, "started_at": null, "state": "Locked", "uuid": "1q71a-dz642-8bepebacm6m21dw" } 2019-01-30T17:58:00.348578626Z Creating Docker container 2019-01-30T17:58:00.608178060Z Attaching container streams 2019-01-30T17:58:01.536004876Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/fa32906a46e92a275a55058b0e59e03f8e417c767fb14d131edc14ef9a861b67/cgroup.procs 2019-01-30T17:58:01.536034423Z notice: monitoring temp dir /tmp/crunch-run.1q71a-dz642-8bepebacm6m21dw.040935147 2019-01-30T17:58:01.536073241Z notice: reading stats from /sys/fs/cgroup/memory/docker/fa32906a46e92a275a55058b0e59e03f8e417c767fb14d131edc14ef9a861b67/memory.stat 2019-01-30T17:58:01.536296339Z mem 28672 cache 1 pgmajfault 745472 rss 2019-01-30T17:58:01.536334304Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/fa32906a46e92a275a55058b0e59e03f8e417c767fb14d131edc14ef9a861b67/cpuacct.stat 2019-01-30T17:58:01.536363962Z notice: reading stats from /sys/fs/cgroup/cpuset/docker/fa32906a46e92a275a55058b0e59e03f8e417c767fb14d131edc14ef9a861b67/cpuset.cpus 2019-01-30T17:58:01.536380576Z cpu 0.0000 user 0.0000 sys 4 cpus 2019-01-30T17:58:01.536396875Z notice: reading stats from /sys/fs/cgroup/blkio/docker/fa32906a46e92a275a55058b0e59e03f8e417c767fb14d131edc14ef9a861b67/blkio.io_service_bytes 2019-01-30T17:58:01.536424292Z blkio:8:0 0 write 16384 read 2019-01-30T17:58:01.536505593Z net:eth0 3225951 tx 193066295 rx 2019-01-30T17:58:01.536516550Z net:docker0 0 tx 0 rx 2019-01-30T17:58:01.536528821Z statfs 27157434368 available 31301435392 used 61612142592 total 2019-01-30T17:58:01.035332136Z Starting Docker container id 'fa32906a46e92a275a55058b0e59e03f8e417c767fb14d131edc14ef9a861b67' 2019-01-30T17:58:02.098692346Z Waiting for container to finish 2019-01-30T17:58:03.879214047Z usage: arvados-cwl-runner [-h] [--basedir BASEDIR] [--outdir OUTDIR] 2019-01-30T17:58:03.879214047Z [--eval-timeout EVAL_TIMEOUT] 2019-01-30T17:58:03.879214047Z [--print-dot | --version | --validate] 2019-01-30T17:58:03.879214047Z [--verbose | --quiet | --debug] [--metrics] 2019-01-30T17:58:03.879214047Z [--tool-help] [--enable-reuse | --disable-reuse] 2019-01-30T17:58:03.879214047Z [--project-uuid UUID] [--output-name OUTPUT_NAME] 2019-01-30T17:58:03.879214047Z [--output-tags OUTPUT_TAGS] 2019-01-30T17:58:03.879214047Z [--ignore-docker-for-reuse] 2019-01-30T17:58:03.879214047Z [--submit | --local | --create-template | --create-workflow | --update-workflow UUID] 2019-01-30T17:58:03.879214047Z [--wait | --no-wait] 2019-01-30T17:58:03.879214047Z [--log-timestamps | --no-log-timestamps] 2019-01-30T17:58:03.879214047Z [--api {jobs,containers}] [--compute-checksum] 2019-01-30T17:58:03.879214047Z [--submit-runner-ram SUBMIT_RUNNER_RAM] 2019-01-30T17:58:03.879214047Z [--submit-runner-image SUBMIT_RUNNER_IMAGE] 2019-01-30T17:58:03.879214047Z [--always-submit-runner] 2019-01-30T17:58:03.879214047Z [--submit-request-uuid UUID | --submit-runner-cluster CLUSTER_ID] 2019-01-30T17:58:03.879214047Z [--collection-cache-size COLLECTION_CACHE_SIZE] 2019-01-30T17:58:03.879214047Z [--name NAME] [--on-error {stop,continue}] 2019-01-30T17:58:03.879214047Z [--enable-dev] [--storage-classes STORAGE_CLASSES] 2019-01-30T17:58:03.879214047Z [--intermediate-output-ttl N] [--priority PRIORITY] 2019-01-30T17:58:03.879214047Z [--thread-count THREAD_COUNT] 2019-01-30T17:58:03.879214047Z [--http-timeout HTTP_TIMEOUT] 2019-01-30T17:58:03.879214047Z [--trash-intermediate | --no-trash-intermediate] 2019-01-30T17:58:03.879214047Z workflow ... 2019-01-30T17:58:03.879214047Z arvados-cwl-runner: error: unrecognized arguments: --collection-keep-cache=256 2019-01-30T17:58:04.167031084Z Container exited with code: 2 2019-01-30T17:58:04.287473388Z Complete 2019-01-30T17:58:04.762585270Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.1q71a-dz642-8bepebacm6m21dw.040935147/keep976348494] 2019-01-30T17:58:05.700347731Z crunch-run finished
Updated by Lucas Di Pentima almost 6 years ago
For the completeness' sake, I've manually tested the fix and it worked.
Updated by Peter Amstutz almost 6 years ago
Lucas Di Pentima wrote:
For the completeness' sake, I've manually tested the fix and it worked.
Thanks. It is always good to do at least a manual test even for "obvious" fixes because you never know when you overlook some trivial syntax error :-)
Looks good to me, please merge.
Updated by Lucas Di Pentima almost 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|c5a89fc450c90216ac2100e537189638a35bdc02.