Project

General

Profile

Feature #21074

Updated by Peter Amstutz 5 months ago

Idea: the "workflow" table is an odd duck.    It stores a single data string in the "definition" field, but doesn't support properties, versioning, trashing, etc.    We want these things for workflows but we don't want to duplicate all the logic.    It would be better if we could just store workflows in collections. 

 However, eliminating the "workflows" API endpoint would be disruptive, as Workbench and arvados-cwl-runner both rely on it.    (We can synchronize workbench updates but people frequently use older versions of arvados-cwl-runner with newer API servers). 

 To migrate workflow records to collections, I propose the following: 

 # Workflow records are migrated over to collections.    The "name" and "description" fields are straightforward.    The contents of the "definition" field would be put in Keep as "workflow.yml".    The collection record would have metadata "type: cwl-workflow"  
 # The Workflow endpoint is migrated to controller 
 # On controller, GET/PUT/POST operations are translated to apply to only collections with "type: cwl-workflow". workflow".    The contents of "definition" would be read from / written to Keep. 
 # when going through the workflows endpoint, collection UUIDs would be mapped to workflow UUIDs with the same cluster and random part just with -7fd4e- substituting for -4zz18- 
 # Going forward, we can choose to either expose additional fields and capabilities through the workflows endpoint (properties, versioning), or phase out the workflows endpoint by updating client code that uses workflows to instead use collections of "type: cwl-workflow" 

 This is probably also an opportunity to extract other metadata from the CWL document and put them in collection properties so that Workbench has it on hand without having to parse the CWL document as it currently does. workflow" 

Back