Idea #22672
Updated by Peter Amstutz 15 days ago
Currently, development best practice is to keep your code (including CWL workflows) in a git repository.
When ready to run in Arvados, the workflows need to be published. This currently involves using @arvados-cwl-runner --create-workflow@ or @arvados-cwl-runner --update-workflow@.
Automating publishing a new/updated workflow should be straightforward, but you want to record (at minimum) the UUID of the workflow so it can be updated.
The minimal version of this is that the user just writes a shell script line that does @arvados-cwl-runner --update-workflow@ with the right UUID, but that gets tedious and would be very obnoxious for a git repository with a large number of tools or workflows (for example, we'd like to import "bio-cwl-tools":https://github.com/common-workflow-library/bio-cwl-tools , which has hundreds).
So I think it makes sense to have a tool called something like @arv-workflow-sync@ to help manage this. I'm envisioning that it would keep a file in the root called something like "arvados-workflows.json" which records which workflows in the git repo should be pushed to which projects/workflow uuids. Running @arv-workflow-sync@ pushes them all.
(Another thought that I had was to query collection properties to find a workflow collection that matches the git info and figure out what to update that way.)
(side note: People could edit workflows directly in Arvados, but we really haven't built out the kind of version control capabilities to make that a comparable experience.)