Story #15535

Updated by Peter Amstutz over 2 years ago

arvados-cwl-runner, on submitting a workflow, uses "pack" to create a single-stream document.

This is because the "workflow" record used to display workflows on workbench only stores a single raw text field, into which the multi-document CWL file has to be stuffed. The rationale for having a workflow record be a text field and not a PDH or git hash to avoid requiring workbench be able to fetch a collection / git repo to display a workflow. Although this isn't a limitation when submitting from the command line, this also uses the "pack" function to minimize having multiple code paths.

Unfortunately the packed version often bares little resemblance to user's original document. It would be better to execute the original document.

Proposal:

At CLI: Upload original workflow files & dependencies to a collection, preserving original filesystem structure. Submit a container request that mounts the collection and runs the workflow.

At workbench: to register workflow record, create a wrapper workflow that has the same input/output interface as the workflow, with a single step with a run line like:

> run: keep:pdh/workflow

To submit the workflow, workbench introspects the step and sets up the correct collection mount.

To display cwl-svg of a workflow, workbench2 needs to be able to fetch the files from keep-web.

To
maximize reuse, dependencies of each CommandLineTool are still copied to separate collections.

Back