Bug #13256
closedWeird directory structure for CommandLineTool CWL pipeline
Description
When running a simple CommandLineTool
pipeline in CWL on Arvados, the final data result is embedded in a directory whose name looks to be a CWL location string. Other than the extra parent directory, the files themselves are what's expected and the CWL pipeline ran successfully.
Here is a local run of arvados-cwl-runner
:
$ arvados-cwl-runner --local --project-uuid su92l-j7d0g-ucwsoqnhrchk231 cwl/clt.cwl yml/clt.yml 2018-03-21 20:30:26 cwltool INFO: /usr/bin/arvados-cwl-runner 1.0.20180223182850, arvados-python-client 0.1.20180223161544, cwltool 1.0.20180130110340 2018-03-21 20:30:26 cwltool INFO: Resolved 'cwl/clt.cwl' to 'file:///home/abram/cwl/clt/cwl/clt.cwl' 2018-03-21 20:30:27 arvados.arv-run INFO: Upload local files: "create-simple-files.sh" 2018-03-21 20:30:28 arvados.arv-run INFO: Uploaded to 741010cc1967c84c3e191a6114cfff1a+66 (su92l-4zz18-hskefwnvhexyuer) 2018-03-21 20:30:28 arvados.cwl-runner INFO: Pipeline instance su92l-d1hrv-i2xtpb2ajshhcqc 2018-03-21 20:30:28 arvados.cwl-runner INFO: [job clt.cwl] reused job su92l-8i9sb-1a12mtzuoo4rn67 2018-03-21 20:30:40 arvados.cwl-runner INFO: Overall process status is success 2018-03-21 20:30:41 arvados.cwl-runner INFO: Final output collection 743b79a7b3702298062082fa8e09caf2+164 "Output of clt.cwl" (su92l-4zz18-jg3rqoc5uewcx8h) { "result": { "basename": "keep:f4c6f248fcb732aea2da749c8ce66672+62", "location": "keep:743b79a7b3702298062082fa8e09caf2+164/keep:f4c6f248fcb732aea2da749c8ce66672+62", "class": "Directory" } } 2018-03-21 20:30:41 cwltool INFO: Final process status is success
Notice the location
that has a CWL keep location string as a directory. Both collections are real collections in arvados.
More succinctly:
$ ls -laR $HOME/keep/by_id/743b79a7b3702298062082fa8e09caf2+164
/home/abram/keep/by_id/743b79a7b3702298062082fa8e09caf2+164:
total 2
dr-xr-xr-x 1 abram abram 0 Jan 1 1970 .
dr-xr-xr-x 1 abram abram 0 Feb 28 15:19 ..
-r-xr-xr-x 1 abram abram 182 Jan 1 1970 cwl.output.json
dr-xr-xr-x 1 abram abram 0 Jan 1 1970 keep:f4c6f248fcb732aea2da749c8ce66672+62
/home/abram/keep/by_id/743b79a7b3702298062082fa8e09caf2+164/keep:f4c6f248fcb732aea2da749c8ce66672+62:
total 2
dr-xr-xr-x 1 abram abram 0 Jan 1 1970 .
dr-xr-xr-x 1 abram abram 0 Jan 1 1970 ..
-r-xr-xr-x 1 abram abram 6 Jan 1 1970 hello.txt
-r-xr-xr-x 1 abram abram 3 Jan 1 1970 ok.txt
And the other collection is also valid:
$ ls -laR $HOME/keep/by_id/f4c6f248fcb732aea2da749c8ce66672+62
/home/abram/keep/by_id/f4c6f248fcb732aea2da749c8ce66672+62:
total 2
dr-xr-xr-x 1 abram abram 0 Jan 1 1970 .
dr-xr-xr-x 1 abram abram 0 Feb 28 15:19 ..
-r-xr-xr-x 1 abram abram 6 Jan 1 1970 hello.txt
-r-xr-xr-x 1 abram abram 3 Jan 1 1970 ok.txt
Here is the CWL:
cwlVersion: v1.0 class: CommandLineTool $namespaces: arv: "http://arvados.org/cwl#" requirements: - class: DockerRequirement dockerPull: arvados/l7g - class: ResourceRequirement coresMin: 1 - class: arv:RuntimeConstraints keep_cache: 10000 baseCommand: bash inputs: script: type: File inputBinding: position: 1 outputs: result: type: Directory outputBinding: glob: "."
Here is the YAML:
script: class: File path: ../src/create-simple-files.sh
Here is the script create-simple-files.sh
:
#!/bin/bash
echo "OK" > ok.txt
echo "hello" > hello.txt
When looking at the the dashboard on su92l
, I cannot see if or where the output collection 743b79a7b3702298062082fa8e09caf2+164
is for the displayed pipeline. The pipeline, as reported by the Dashboard on su92l
, gives a link to the collection f4c6f248fcb732aea2da749c8ce66672+62
(the one without the weird parent directory).
Though I've labeled this as a bug I'm not sure it actually is and this might be desired behavior. It was surprising to me, regardless, and it's not obvious what the rules are for the format of the output collection as reported by the location
field in the JSON output.