Project

General

Profile

Actions

Bug #14726

open

[CWL] Propagating input file to output gets confusing error

Added by Peter Amstutz almost 6 years ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

bcbio prep_samples_to_rec takes GRCh37.fa with secondaryFiles, and then return that same file in cwl.output.json. However, when it returns it, several secondary files have been added. From the perspective of arvados-cwl-runner, these files have appeared out of nowhere, because they were not declared in the inputs, and are not found in the output directory. However, this isn't detected as a user error but instead results in a failure, so the message in this case is extremely confusing and does not communicate to the user how to fix it.

2019-01-11 14:54:53 cwltool DEBUG: [job prep_samples_to_rec] initializing from file:///home/peter/work/tmp/kfang/workflow.json#prep_samples_to_rec.cwl as part of step prep_samples_to_rec
2019-01-11 14:54:53 cwltool DEBUG: [job prep_samples_to_rec] {
    "rgnames__sample": [
        "RMNISTHS_30xdownsample" 
    ], 
    "reference__fasta__base": [
        {
            "basename": "GRCh37.fa", 
            "nameroot": "GRCh37", 
            "nameext": ".fa", 
            "location": "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa", 
            "secondaryFiles": [
                {
                    "basename": "GRCh37.fa.fai", 
                    "nameroot": "GRCh37.fa", 
                    "nameext": ".fai", 
                    "location": "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai", 
                    "class": "File", 
                    "size": 2746
                }, 
                {
                    "basename": "GRCh37.dict", 
                    "nameroot": "GRCh37", 
                    "nameext": ".dict", 
                    "location": "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict", 
                    "class": "File", 
                    "size": 10257
                }
            ], 
            "class": "File", 
            "size": 3153506519
        }
    ], 
    "config__algorithm__variant_regions": [
        null
    ], 
    "description": [
        "RMNISTHS_30xdownsample" 
    ], 
    "resources": [
        "{}" 
    ]
}
2019-01-11 14:54:53 arvados.arv-run INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0
2019-01-11 14:54:53 cwltool DEBUG: [job prep_samples_to_rec] path mappings is {
    "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa": [
        "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa", 
        "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa", 
        "File", 
        true
    ], 
    "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict": [
        "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict", 
        "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict", 
        "File", 
        true
    ], 
    "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai": [
        "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai", 
        "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai", 
        "File", 
        true
    ]
}

2019-01-11 14:55:05 cwltool DEBUG: Raw output from keep:dc8d284de7b3e4743524020de33c2799+290/cwl.output.json: {
    "prep_samples_rec": [
        {
            "rgnames__sample": "RMNISTHS_30xdownsample", 
            "reference__fasta__base": {
                "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa", 
                "class": "File", 
                "secondaryFiles": [
                    {
                        "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai", 
                        "class": "File" 
                    }, 
                    {
                        "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict", 
                        "class": "File" 
                    }, 
                    {
                        "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz", 
                        "class": "File" 
                    }, 
                    {
                        "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz.gzi", 
                        "class": "File" 
                    }, 
                    {
                        "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37-resources.yaml", 
                        "class": "File" 
                    }, 
                    {
                        "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz.fai", 
                        "class": "File" 
                    }
                ]
            }, 
            "config__algorithm__variant_regions": null, 
            "description": "RMNISTHS_30xdownsample", 
            "resources": "{\"default\":{\"cores\":1,\"jvm_opts\":[\"-Xms1000m\",\"-Xmx16384m\"],\"memory\":\"16384M\"}}" 
        }
    ]
}

2019-01-11 14:55:05 arvados.cwl-runner ERROR: [container prep_samples_to_rec] while getting output object: u'keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz'
Traceback (most recent call last):
  File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/arvcontainer.py", line 350, in done
    outputs = done.done_outputs(self, container, "/tmp", self.outdir, "/keep")
  File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/done.py", line 53, in done_outputs
    return self.collect_outputs("keep:" + record["output"])
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/command_line_tool.py", line 616, in collect_output_ports
    visit_class(ret, ("File", "Directory"), cast(Callable[[Any], Any], revmap))
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 214, in visit_class
    visit_class(rec[d], cls, op)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 217, in visit_class
    visit_class(d, cls, op)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 214, in visit_class
    visit_class(rec[d], cls, op)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 214, in visit_class
    visit_class(rec[d], cls, op)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 217, in visit_class
    visit_class(d, cls, op)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 212, in visit_class
    op(rec)
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/command_line_tool.py", line 159, in revmap_file
    if revmap_f and not builder.pathmapper.mapper(revmap_f[0]).type.startswith("Writable"):
  File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/pathmapper.py", line 318, in mapper
    return self._pathmap[src]
KeyError: u'keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz'
2019-01-11 14:55:05 cwltool ERROR: [step prep_samples_to_rec] Output is missing expected field file:///home/peter/work/tmp/kfang/workflow.json#main/prep_samples_to_rec/prep_samples_rec
Actions

Also available in: Atom PDF