Story #8488

[Microsoft] Democratize running bcbio CWL on qr1hi

Added by Brad Chapman over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Documentation
Target version:
Start date:
02/18/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

As part of the Microsoft work, we need to demonstrate running variant calling and validation using bcbio and CWL (#8382 and #8381). We'll plan to do this on qr1hi using Peter's work with cwl-runner (#8176).

From the bcbio side, the latest test CWL is available from:

https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz

with more documentation here:

https://github.com/chapmanb/bcbio-nextgen/tree/master/cwl

We need to document/train Brad how to run on qr1hi so he can test and iterate new versions.


Subtasks

Task #8489: Run crunchrunner from collection, provide collection in job parametersResolvedPeter Amstutz

Task #8521: Review 8488-cwl-crunchrunner-collectionResolvedBrad Chapman

Associated revisions

Revision 9e5b98e8
Added by Peter Amstutz over 5 years ago

Merge branch '8488-cwl-crunchrunner-collection' closes #8488

History

#1 Updated by Brad Chapman over 5 years ago

Peter;
Thanks for the tour of installing and testing this on Friday. I've gotten cwl-runner setup from your branch and was able to run an initial test run. Awesome.

I'm running into an issue where I think the bcbio/bcbio image on qr1hi (https://cloud.curoverse.com/collections/qr1hi-4zz18-doidmcskcmhn2bm) is out of date. How do we refresh it to the latest?

The run I got started is here:

https://cloud.curoverse.com/pipeline_instances/qr1hi-d1hrv-nybexwq0vehhuu4

and was failing with this error:

2016-02-27 18:42:32 arvados.cwl-runner[5027] ERROR: Got exception while collecting job outputs:
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/arvados_cwl/__init__.py", line 197, in done
    outputs = self.collect_outputs(self.builder.outdir)
  File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 235, in collect_output_ports
    ret[fragment] = self.collect_output(port, builder, outdir)
  File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 316, in collect_output
    adjustFileObjs(r, revmap)
  File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs
    adjustFileObjs(d, op)
  File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs
    adjustFileObjs(d, op)
  File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 149, in adjustFileObjs
    op(rec)
  File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 78, in revmap_file
    raise WorkflowException("Output file path %s must be within designated output directory (%s) or an input file pass through." % (f["path"], builder.outdir))
WorkflowException: Output file path align_prep/7_100326_FC6107FAAXX-1.fq.gz must be within designated output directory (keep:d586abc216dd7011f2e57eecc674f804+469) or an input file pass through.

which I believe is due to having relative paths in the output JSON. This was fixed in bcbio a couple of weeks back with the corresponding fix to cwltool (https://github.com/common-workflow-language/cwltool/pull/40). So I hope a refresh of the container will just fix it. It would also be great if I could re-update the container on demand as the latest also contains a lot of new functionality for the Microsoft work (variant calling, validation, SNAP support) that will probably need a few more iterations.

For reference, the up to date CWL I'm running is here:

https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz

I've also written up skeleton documentation on running this and will push that out once I've got a working run. Thanks again for this, I'm excited to have this so close to running.

#2 Updated by Peter Amstutz over 5 years ago

To update bcbio in Arvados, try "arv-keepdocker bcbio/bcbio"

#3 Updated by Brad Chapman over 5 years ago

Peter;
Thanks for the tip, and for updating the bcbio docker image (with the right local version of Docker). It looks like I got the latest Docker but am still running into the same issue:

2016-02-29 14:49:06 arvados.cwl-runner[19734] INFO: Job prep_align_inputs (qr1hi-8i9sb-zmvw2u2jdpertue) is Complete
2016-02-29 14:49:07 arvados.cwl-runner[19734] ERROR: Got exception while collecting job outputs:
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/arvados_cwl/__init__.py", line 197, in done
    outputs = self.collect_outputs(self.builder.outdir)
  File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 235, in collect_output_ports
    ret[fragment] = self.collect_output(port, builder, outdir)
  File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 316, in collect_output
    adjustFileObjs(r, revmap)
  File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs
    adjustFileObjs(d, op)
  File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 154, in adjustFileObjs
    adjustFileObjs(d, op)
  File "build/bdist.linux-x86_64/egg/cwltool/process.py", line 149, in adjustFileObjs
    op(rec)
  File "build/bdist.linux-x86_64/egg/cwltool/draft2tool.py", line 78, in revmap_file
    raise WorkflowException("Output file path %s must be within designated output directory (%s) or an input file pass through." % (f["path"], builder.outdir))
WorkflowException: Output file path /tmp/crunch-job-task-work/compute2.1/outdir/align_prep/7_100326_FC6107FAAXX-1.fq.gz must be within designated output directory (keep:ed5abc7f4ed6c6771b68c208a8d10680+442) or an input file pass through.

It's now specifying the full output path instead of a relative path (/tmp/crunch-job-task-work/compute2.1/outdir/align_prep/7_100326_FC6107FAAXX-1.fq.gz) but that's not getting translated back into keep hash language so it barfs. Any ideas about how to proceed? Thanks again.

#4 Updated by Peter Amstutz over 5 years ago

  • Status changed from New to In Progress

#5 Updated by Brett Smith over 5 years ago

  • Target version changed from 2016-03-02 sprint to 2016-03-16 sprint
  • Story points changed from 1.0 to 0.5

#6 Updated by Peter Amstutz over 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:9e5b98e8f5f4727856b53447191f9c06e3da2ba6.

Also available in: Atom PDF