Project

General

Profile

Bug #14726

Updated by Peter Amstutz over 5 years ago

bcbio prep_samples_to_rec takes GRCh37.fa with secondaryFiles, and then return that same file in cwl.output.json.    However, when it returns it, several secondary files have been added.    From the perspective of arvados-cwl-runner, these files have appeared out of nowhere, because they were not declared in the inputs, and are not found in the output directory.    However, this isn't detected as a user error but instead results in a failure, so the message in this case is extremely confusing and does not communicate to the user how to fix it. 

 <pre> 
 2019-01-11 14:54:53 cwltool DEBUG: [job prep_samples_to_rec] initializing from file:///home/peter/work/tmp/kfang/workflow.json#prep_samples_to_rec.cwl as part of step prep_samples_to_rec 
 2019-01-11 14:54:53 cwltool DEBUG: [job prep_samples_to_rec] { 
     "rgnames__sample": [ 
         "RMNISTHS_30xdownsample" 
     ],  
     "reference__fasta__base": [ 
         { 
             "basename": "GRCh37.fa",  
             "nameroot": "GRCh37",  
             "nameext": ".fa",  
             "location": "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa",  
             "secondaryFiles": [ 
                 { 
                     "basename": "GRCh37.fa.fai",  
                     "nameroot": "GRCh37.fa",  
                     "nameext": ".fai",  
                     "location": "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai",  
                     "class": "File",  
                     "size": 2746 
                 },  
                 { 
                     "basename": "GRCh37.dict",  
                     "nameroot": "GRCh37",  
                     "nameext": ".dict",  
                     "location": "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict",  
                     "class": "File",  
                     "size": 10257 
                 } 
             ],  
             "class": "File",  
             "size": 3153506519 
         } 
     ],  
     "config__algorithm__variant_regions": [ 
         null 
     ],  
     "description": [ 
         "RMNISTHS_30xdownsample" 
     ],  
     "resources": [ 
         "{}" 
     ] 
 } 
 2019-01-11 14:54:53 arvados.arv-run INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0 
 2019-01-11 14:54:53 cwltool DEBUG: [job prep_samples_to_rec] path mappings is { 
     "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa": [ 
         "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa",  
         "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa",  
         "File",  
         true 
     ],  
     "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict": [ 
         "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict",  
         "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict",  
         "File",  
         true 
     ],  
     "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai": [ 
         "keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai",  
         "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai",  
         "File",  
         true 
     ] 
 } 


 2019-01-11 14:55:05 cwltool DEBUG: Raw output from keep:dc8d284de7b3e4743524020de33c2799+290/cwl.output.json: { 
     "prep_samples_rec": [ 
         { 
             "rgnames__sample": "RMNISTHS_30xdownsample",  
             "reference__fasta__base": { 
                 "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa",  
                 "class": "File",  
                 "secondaryFiles": [ 
                     { 
                         "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.fai",  
                         "class": "File" 
                     },  
                     { 
                         "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.dict",  
                         "class": "File" 
                     },  
                     { 
                         "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz",  
                         "class": "File" 
                     },  
                     { 
                         "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz.gzi",  
                         "class": "File" 
                     },  
                     { 
                         "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37-resources.yaml",  
                         "class": "File" 
                     },  
                     { 
                         "path": "/keep/b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz.fai",  
                         "class": "File" 
                     } 
                 ] 
             },  
             "config__algorithm__variant_regions": null,  
             "description": "RMNISTHS_30xdownsample",  
             "resources": "{\"default\":{\"cores\":1,\"jvm_opts\":[\"-Xms1000m\",\"-Xmx16384m\"],\"memory\":\"16384M\"}}" 
         } 
     ] 
 } 

 2019-01-11 14:55:05 arvados.cwl-runner ERROR: [container prep_samples_to_rec] while getting output object: u'keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz' 
 Traceback (most recent call last): 
   File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/arvcontainer.py", line 350, in done 
     outputs = done.done_outputs(self, container, "/tmp", self.outdir, "/keep") 
   File "/home/peter/.arvbox/arvbox/arvados/sdk/cwl/arvados_cwl/done.py", line 53, in done_outputs 
     return self.collect_outputs("keep:" + record["output"]) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/command_line_tool.py", line 616, in collect_output_ports 
     visit_class(ret, ("File", "Directory"), cast(Callable[[Any], Any], revmap)) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 214, in visit_class 
     visit_class(rec[d], cls, op) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 217, in visit_class 
     visit_class(d, cls, op) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 214, in visit_class 
     visit_class(rec[d], cls, op) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 214, in visit_class 
     visit_class(rec[d], cls, op) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 217, in visit_class 
     visit_class(d, cls, op) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/utils.py", line 212, in visit_class 
     op(rec) 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/command_line_tool.py", line 159, in revmap_file 
     if revmap_f and not builder.pathmapper.mapper(revmap_f[0]).type.startswith("Writable"): 
   File "/home/peter/work/scripts/venv/local/lib/python2.7/site-packages/cwltool/pathmapper.py", line 318, in mapper 
     return self._pathmap[src] 
 KeyError: u'keep:b334527110a98f97af35dfd3912fc989+40015/GRCh37/seq/GRCh37.fa.gz' 
 2019-01-11 14:55:05 cwltool ERROR: [step prep_samples_to_rec] Output is missing expected field file:///home/peter/work/tmp/kfang/workflow.json#main/prep_samples_to_rec/prep_samples_rec 
 </pre> 

Back