Project

General

Profile

Idea #15535

Updated by Peter Amstutz over 4 years ago

arvados-cwl-runner, on submitting a workflow, uses "pack" to create a single-stream document.   

 This is because the "workflow" record used to display workflows on workbench only stores a single raw text field, into which the multi-document CWL file has to be stuffed.    The rationale for having a workflow record be a text field and not a PDH or git hash to avoid requiring workbench be able to fetch a collection / git repo to display a workflow.    Although this isn't a limitation when submitting from the command line, this also uses the "pack" function to minimize having multiple code paths. 

 Unfortunately the packed version often bares little resemblance to user's original document.    It would be better to execute the original document. 

 Proposal: 

 # Upload original workflow files & dependencies to a collection, preserving original filesystem structure.    Define a new scheme for workflow records that points to the collection PDH.    Could also ditch 'workflow' record entirely and just use metadata on collections to indicate the collection stores a workflow. 
 # Workbench, when it needs to display the workflow, fetches it from keep-web (WebDAV) 
 # To maximize reuse, dependencies of each CommandLineTool are still copied to separate collections 

 Alternately / in addition to storing in collections, could reference git commits.    Workbench could access git via js-git (same way that Composer does it). 

Back