Project

General

Profile

Actions

Bug #13256

closed

Weird directory structure for CommandLineTool CWL pipeline

Added by Abram Connelly about 6 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

When running a simple CommandLineTool pipeline in CWL on Arvados, the final data result is embedded in a directory whose name looks to be a CWL location string. Other than the extra parent directory, the files themselves are what's expected and the CWL pipeline ran successfully.

Here is a local run of arvados-cwl-runner:

$ arvados-cwl-runner --local --project-uuid su92l-j7d0g-ucwsoqnhrchk231 cwl/clt.cwl yml/clt.yml
2018-03-21 20:30:26 cwltool INFO: /usr/bin/arvados-cwl-runner 1.0.20180223182850, arvados-python-client 0.1.20180223161544, cwltool 1.0.20180130110340
2018-03-21 20:30:26 cwltool INFO: Resolved 'cwl/clt.cwl' to 'file:///home/abram/cwl/clt/cwl/clt.cwl'
2018-03-21 20:30:27 arvados.arv-run INFO: Upload local files: "create-simple-files.sh" 
2018-03-21 20:30:28 arvados.arv-run INFO: Uploaded to 741010cc1967c84c3e191a6114cfff1a+66 (su92l-4zz18-hskefwnvhexyuer)
2018-03-21 20:30:28 arvados.cwl-runner INFO: Pipeline instance su92l-d1hrv-i2xtpb2ajshhcqc
2018-03-21 20:30:28 arvados.cwl-runner INFO: [job clt.cwl] reused job su92l-8i9sb-1a12mtzuoo4rn67
2018-03-21 20:30:40 arvados.cwl-runner INFO: Overall process status is success
2018-03-21 20:30:41 arvados.cwl-runner INFO: Final output collection 743b79a7b3702298062082fa8e09caf2+164 "Output of clt.cwl" (su92l-4zz18-jg3rqoc5uewcx8h)
{
    "result": {
        "basename": "keep:f4c6f248fcb732aea2da749c8ce66672+62", 
        "location": "keep:743b79a7b3702298062082fa8e09caf2+164/keep:f4c6f248fcb732aea2da749c8ce66672+62", 
        "class": "Directory" 
    }
}
2018-03-21 20:30:41 cwltool INFO: Final process status is success

Notice the location that has a CWL keep location string as a directory. Both collections are real collections in arvados.

More succinctly:

$ ls -laR $HOME/keep/by_id/743b79a7b3702298062082fa8e09caf2+164
/home/abram/keep/by_id/743b79a7b3702298062082fa8e09caf2+164:
total 2
dr-xr-xr-x 1 abram abram   0 Jan  1  1970 .
dr-xr-xr-x 1 abram abram   0 Feb 28 15:19 ..
-r-xr-xr-x 1 abram abram 182 Jan  1  1970 cwl.output.json
dr-xr-xr-x 1 abram abram   0 Jan  1  1970 keep:f4c6f248fcb732aea2da749c8ce66672+62

/home/abram/keep/by_id/743b79a7b3702298062082fa8e09caf2+164/keep:f4c6f248fcb732aea2da749c8ce66672+62:
total 2
dr-xr-xr-x 1 abram abram 0 Jan  1  1970 .
dr-xr-xr-x 1 abram abram 0 Jan  1  1970 ..
-r-xr-xr-x 1 abram abram 6 Jan  1  1970 hello.txt
-r-xr-xr-x 1 abram abram 3 Jan  1  1970 ok.txt

And the other collection is also valid:

$ ls -laR $HOME/keep/by_id/f4c6f248fcb732aea2da749c8ce66672+62
/home/abram/keep/by_id/f4c6f248fcb732aea2da749c8ce66672+62:
total 2
dr-xr-xr-x 1 abram abram 0 Jan  1  1970 .
dr-xr-xr-x 1 abram abram 0 Feb 28 15:19 ..
-r-xr-xr-x 1 abram abram 6 Jan  1  1970 hello.txt
-r-xr-xr-x 1 abram abram 3 Jan  1  1970 ok.txt

Here is the CWL:

cwlVersion: v1.0
class: CommandLineTool
$namespaces:
  arv: "http://arvados.org/cwl#" 
requirements:
  - class: DockerRequirement
    dockerPull: arvados/l7g
  - class: ResourceRequirement
    coresMin: 1
  - class: arv:RuntimeConstraints
    keep_cache: 10000

baseCommand: bash

inputs:
  script:
    type: File
    inputBinding:
      position: 1

outputs:
  result:
    type: Directory
    outputBinding:
      glob: "." 

Here is the YAML:

script:
  class: File
  path: ../src/create-simple-files.sh

Here is the script create-simple-files.sh:

#!/bin/bash

echo "OK" > ok.txt
echo "hello" > hello.txt

When looking at the the dashboard on su92l, I cannot see if or where the output collection 743b79a7b3702298062082fa8e09caf2+164 is for the displayed pipeline. The pipeline, as reported by the Dashboard on su92l, gives a link to the collection f4c6f248fcb732aea2da749c8ce66672+62 (the one without the weird parent directory).

Though I've labeled this as a bug I'm not sure it actually is and this might be desired behavior. It was surprising to me, regardless, and it's not obvious what the rules are for the format of the output collection as reported by the location field in the JSON output.

Actions #1

Updated by Peter Amstutz about 4 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF