Project

General

Profile

Idea #15535

Updated by Peter Amstutz over 4 years ago

arvados-cwl-runner, on submitting a workflow, uses "pack" to create a single-stream document.   

 This is because the "workflow" record used to display workflows on workbench only stores a single raw text field, into which the multi-document CWL file has to be stuffed.    The rationale for having a workflow record be a text field and not a PDH or git hash to avoid requiring workbench be able to fetch a collection / git repo to display a workflow.    Although this isn't a limitation when submitting from the command line, this also uses the "pack" function to minimize having multiple code paths. 

 Unfortunately the packed version often bares little resemblance to user's original document.    It would be better to execute the original document. 

 Proposal: 

 At CLI: # Upload original workflow files & dependencies to a collection, preserving original filesystem structure.    Submit Define a container request new scheme for workflow records that mounts points to the collection PDH.    Could also ditch 'workflow' record entirely and runs just use metadata on collections to indicate the collection stores a workflow. 

 At workbench: 
 # Workbench, when it needs to register workflow record, create a wrapper workflow that has display the same input/output interface as the workflow, with a single step with a run line like: 

 > run: keep:pdh/workflow 

 fetches it from keep-web (WebDAV) 
 # To submit the workflow, workbench introspects the step and sets up the correct collection mount. 

 To maximize reuse, dependencies of each CommandLineTool are still copied to separate collections. 


 collections 

 Alternately / in addition to storing in collections, could reference git commits.    Workbench could access git via js-git (same way that Composer does it). 

Back