Feature #19385
closeda-c-r uploads workflow files + dependencies to a collection & executes from that instead of packed workfows
Files
Updated by Peter Amstutz about 2 years ago
Packing is bad
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-09-14 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-09-14 sprint to 2022-09-28 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-09-28 sprint to 2022-10-12 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-10-12 sprint to 2022-11-09 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-09 sprint to 2022-11-23 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
Updated by Peter Amstutz almost 2 years ago
- Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
Updated by Peter Amstutz almost 2 years ago
- Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Updated by Peter Amstutz almost 2 years ago
- Target version changed from 2023-02-01 sprint to 2023-01-18 sprint
Updated by Peter Amstutz almost 2 years ago
- Story points set to 5.0
- Status changed from New to In Progress
Updated by Peter Amstutz almost 2 years ago
- Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Updated by Peter Amstutz almost 2 years ago
- Target version changed from 2023-02-01 sprint to 2023-02-15 sprint
Updated by Peter Amstutz almost 2 years ago
19385-cwl-fast-pack @ 083b86a4e748900bcc285cac8bfd2ecdd36679f6
This is a significant rewrite of the upload_workflow()
method.
The purpose of this method is to bundle the workflow up into a form where it can be uploaded to Arvados for execution, with all of the workflow's external dependencies replaced with Arvados references.
The previous approach was to "pack" the workflow into a monolithic JSON document, but this approach has a couple of drawbacks.
- The resulting "packed" file is reformatted from the original file, and not particularly human friendly
- The "pack" process itself is slow.
The new approach uploads the files making up the workflow to a Collection. These are lightly updated but the processing is much less intensive than using pack()
. The resulting files in Arvados are also much closer (or unchanged entirely) from the original files.
This branch also streamlines the workflow launch process by eliminating instances where it would re-load the Workflow document, determining this was largely redundant work that contributed significantly to the runtime.
When executing arvados-cwl-runner --create-workflow
on a large customer workflow, execution time went from 8m9s on 2.5.0 to 1m16s on this branch.
This branch also adds support for the --fast-parser
feature (not yet enabled by default) This uses a different code path for parsing and validating the CWL document which is significantly more efficient and results in even more runtime improvement (36s in the previous example) however there are usability issues around reporting parsing and runtime errors that are still being worked on.
Tested and passing with CWL unit tests, CWL conformance tests v1.0 - v1.2, and Arvados CWL integration tests.
There are quite a lot of commits here, I recommend reviewing by looking at git diff main..19385-cwl-fast-pack
rather than trying to follow the development history.
Updated by Peter Amstutz almost 2 years ago
- Status changed from In Progress to Resolved