Bug #12903
openarvados-cwl-runner only loads cwl $import directives when run with `--local`
Description
When an $import: directive is used in a CWL workflow, arvados-cwl-runner works when using `--local`, but otherwise fails with an error such as:
2018-01-04T00:18:51.326623454Z stderr [Errno 2] No such file or directory: '/tmp/arvados-pipelines/cwl/tools/samtools/samtools-docker.yml'
This particular one was for a CommandLineTool containing:
requirements: - $import: samtools-docker.yml
I guess that the import files are not being staged into the container that runs the workflow.
Updated by Joshua Randall about 7 years ago
Actually it looks like it does upload the imported local file but it doesn't seem to change the import reference to actually point to the file in keep:
2018-01-04 00:57:24 arvados.arv-run INFO: Upload local files: "samtools-docker.yml" 2018-01-04 00:57:24 arvados.arv-run INFO: Using collection bde5c8f83bb5c67e93bfa1319f6767ba+65 (ncucu-4zz18-vandp94plm5df2l) 2018-01-04 00:57:24 arvados.arv-run INFO: Using collection bde5c8f83bb5c67e93bfa1319f6767ba+65 (ncucu-4zz18-on43dy4lig8uevm) ... 2018-01-04 00:57:26 arvados.cwl-runner INFO: [container gatk-3.8-haplotypecaller-genotypegvcfs-libraries.cwl] submitted container ncucu-xvhdp-jotmtoacup1g27r 2018-01-04 00:58:40 arvados.cwl-runner INFO: [container gatk-3.8-haplotypecaller-genotypegvcfs-libraries.cwl] ncucu-xvhdp-jotmtoacup1g27r is Final 2018-01-04 00:58:40 arvados.cwl-runner INFO: [container gatk-3.8-haplotypecaller-genotypegvcfs-libraries.cwl] error log: 2018-01-04 00:58:40 arvados.cwl-runner INFO: 2018-01-04T00:57:38.219161149Z crunch-run crunch-run 0.1.20171212165144.296aa66 started 2018-01-04T00:57:38.219178999Z crunch-run Executing container 'ncucu-dz642-16b37dab397yqfd' 2018-01-04T00:57:38.219195207Z crunch-run Executing on host 'arvados-compute-node-ncucu-081' 2018-01-04T00:57:38.261518904Z crunch-run Fetching Docker image from collection '53c58d1523c724bd66e04c4320e0e4bb+342' 2018-01-04T00:57:38.272505117Z crunch-run Using Docker image id 'sha256:9dd51945c95d96ee7d7b5976444c6de23f2d6b1713739298614815bffd72e73f' 2018-01-04T00:57:38.273296529Z crunch-run Loading Docker image from keep 2018-01-04T00:57:45.163186366Z crunch-run Docker response: {"stream":"Loaded image ID: sha256:9dd51945c95d96ee7d7b5976444c6de23f2d6b1713739298614815bffd72e73f\n"} 2018-01-04T00:57:45.197876280Z crunch-run Running [arv-mount --foreground --allow-other --read-write --crunchstat-interval=10 --file-cache 268435456 --mount-tmp tmp0 --mount-by-pdh by_id /tmp/keep036887651] 2018-01-04T00:57:45.500772790Z crunch-run Creating Docker container 2018-01-04T00:57:45.540047908Z crunch-run Attaching container streams 2018-01-04T00:57:45.872429084Z crunch-run Starting Docker container id 'cc4c2a608c19aa079e4595290ba44367da25a7c3845877259a8a9f9193927669' 2018-01-04T00:57:46.061317295Z crunch-run Waiting for container to finish 2018-01-04T00:57:46.392426004Z stderr cwltool INFO: /usr/bin/arvados-cwl-runner dc78526ba494973df7d298825e20503353e92adf 1.0.20171116210428, arvados-python-client 0.1.20171109204045, cwltool 1.0.20170928192020 2018-01-04T00:57:46.397656552Z stderr cwltool INFO: Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 2018-01-04T00:58:35.248131243Z stderr cwltool WARNING: Workflow checker warning: 2018-01-04T00:58:35.248131243Z stderr cwltool WARNING: ../../lib/cwl/workflow.json:1:87635: Source 'outOutput' of type {"items": ["null", "File"], "type": "array"} is partially incompatible 2018-01-04T00:58:35.248131243Z stderr ../../lib/cwl/workflow.json:1:82363: with sink 'gvcf_file' of type {"items": "File", "type": "array"} 2018-01-04T00:58:36.285303171Z stderr cwltool ERROR: Unhandled error, try again with --debug for more information: 2018-01-04T00:58:36.285303171Z stderr [Errno 2] No such file or directory: '/home/mercury/checkouts/arvados-pipelines/cwl/tools/samtools/samtools-docker.yml' 2018-01-04T00:58:36.552124032Z crunch-run Container exited with code: 1 2018-01-04T00:58:36.647774524Z crunch-run Complete 2018-01-04 00:58:40 arvados.cwl-runner WARNING: Overall process status is permanentFail 2018-01-04 00:58:40 arvados.cwl-runner INFO: Final output collection 5551f5d7ec57a8aa9758d14a976e87e9+57 {} 2018-01-04 00:58:40 cwltool WARNING: Final process status is permanentFail
$ arv keep ls bde5c8f83bb5c67e93bfa1319f6767ba+65 ./samtools-docker.yml
Updated by Joshua Randall almost 7 years ago
I believe the CommandLineTool that is triggering this issue is: https://github.com/wtsi-hgi/arvados-pipelines/blob/58966f2211f043ee22e16b4e75b1a829dc51d36b/cwl/tools/samtools/samtools-faidx.cwl
It may or may not be important that there are actually nested $import directives - one in the requirements of the CommandLineTool, referencing samtools-docker.yml and another in samtools-docker,yml referencing samtools-Dockerfile (in the dockerPull property). I figure you probably are not even processing the dockerPull directive, so that probably is not part of the issue here, but wanted to mention it.
Updated by Tom Clegg almost 7 years ago
- Related to Bug #12934: [CWL] Fix Arvados bugs revealed by newly added conformance tests added
Updated by Tom Clegg almost 7 years ago
This might have been fixed in #12934 (merged jan 16)
Updated by Joshua Randall almost 7 years ago
This continues to be a problem as of:
ii python-arvados-cwl-runner 1.0.20180216164101-3 all The Arvados CWL runner
A workflow submitted with this input file works fine:
$ cat 15x-interval-147.library-cram-to-gvcfs.noimport.001.yaml cwl:tool: ../workflows/gatk-4.0.0.0-haplotypecaller-genotypegvcfs-libraries.cwl library_cram: class: File location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram secondaryFiles: - class: File location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai chunks: 200 intersect_file: class: File location: keep:0209730ab274aa4adce0557580fa6c64+90/wgs_calling_regions.hg38.interval_list ref_fasta_files: - $import: sanger_human_references.yaml
However, the same workflow with this input file fails:
$ cat 15x-interval-147.library-cram-to-gvcfs.001.yaml cwl:tool: ../workflows/gatk-4.0.0.0-haplotypecaller-genotypegvcfs-libraries.cwl library_cram: $import: 15x-interval-147.library_cram.001.yaml chunks: 200 intersect_file: class: File location: keep:0209730ab274aa4adce0557580fa6c64+90/wgs_calling_regions.hg38.interval_list ref_fasta_files: - $import: sanger_human_references.yaml $ cat 15x-interval-147.library_cram.001.yaml class: File location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram secondaryFiles: - class: File location: keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai
Observe the difference in the cwl.input.json submitted in the container request for the failing workflow:
$ arv get ncucu-xvhdp-to7wppbrlrt8x5e | jq '.mounts["/var/lib/cwl/cwl.input.json"].content.library_cram' { "class": "File", "location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram", "secondaryFiles": [ { "class": "File", "location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai", "basename": "15399492.CCXX.paired310.0f619520c3.cram.crai" } ], "id": "file:///home/mercury/checkouts/arvados-pipelines/cwl/15x-interval-147.library_cram.001.yaml", "basename": "15399492.CCXX.paired310.0f619520c3.cram" }
As opposed to the working workflow:
$ arv get ncucu-xvhdp-kuk3ouys8d4q3lg | jq '.mounts["/var/lib/cwl/cwl.input.json"].content.library_cram' { "class": "File", "location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram", "secondaryFiles": [ { "class": "File", "location": "keep:af5c427f63eb6dfd527328fb224a4c15+14400/15399492.CCXX.paired310.0f619520c3.cram.crai", "basename": "15399492.CCXX.paired310.0f619520c3.cram.crai" } ], "basename": "15399492.CCXX.paired310.0f619520c3.cram" }
For some reason, the $import directive is causing an `id` property to be added to the `File` object, which the submitted runner then attempts to dereference, resulting in a failure.
Note that there is no problem with `ref_fasta_files` in this workflow. There likewise is no problem with $import if you replace the `File` input with a `File[]` input and have a list of files in the imported yaml.