Bug #19688
closedLaunch registered workflows faster
Description
Launching very large registered workflows is slow and this is a big problem. Related, registered workflows can be huge records that make project list responses slow.
Fast launch¶
Assume that the workflow has been fully processed. Make a fast path which takes the workflow as-is and constructs a container request.
Lightweight records¶
Copy the input and output sections into a wrapper workflow. The wrapper workflow has one step that invokes the real workflow, which is stored in a collection.
Files
Updated by Peter Amstutz about 2 years ago
- Related to Idea #17848: CWL runner improvements added
Updated by Peter Amstutz about 2 years ago
- File arvados-cwl-runner-2.5.0.dev20221104015049.tar.gz arvados-cwl-runner-2.5.0.dev20221104015049.tar.gz added
Dev package for testing. Includes code from this branch as well as #19699
Updated by Peter Amstutz about 2 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 2 years ago
19688-cwl-fast-path @ 9f39acdc1d65d8b2714ffccd51a0311e6df9ec4a
Adds a new "fast path" for running workflows that have been registered on the API server.
This leverages the fact that all the dependency rewriting and workflow packing has already taken place. Previously, it was re-doing this work. For some large customer workflows, this is extremely expensive (taking multiple minutes).
Key points:
- Enables fast path when the workflow is being submitted and is provided as a workflow UUID
- It minimizes overhead by only loading the top level workflow and suppressing the load of the workflow steps
- When submitting the workflow, it avoid going through the "packing" step and simply copies the already-packed workflow from the workflow record into the container request
- To further reduce overhead in submitting, the registered workflow is now a wrapper workflow which has the same input/output signature as the original workflow, but only a single step, which invokes the packed workflow. Combined with the first bullet (not loading workflow steps), this avoids virtually all of the parsing overhead during submit.
- Puts the packed workflow into a collection, this has the added benefit of partially addressing a couple of other long-standing issues
- Workflow definitions are now visible in workbench by visiting the collection.
- No longer storing potentially huge workflow definitions in the "mounts" field of container requests, which led to slow load times of project views.
- Fixed tests
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-09 sprint to 2022-11-23 sprint
Updated by Peter Amstutz about 2 years ago
- Related to Bug #19699: HTTP download creates collections with too-long names, needs flag to run in runner process after submission added
Updated by Peter Amstutz about 2 years ago
- File arvados-cwl-runner-2.5.0.dev20221113232614.tar.gz arvados-cwl-runner-2.5.0.dev20221113232614.tar.gz added
When registering a single command line tool, it will run with the same of the original workflow including git description.
On second thought, this isn't what we want. Another package coming up.
Updated by Peter Amstutz about 2 years ago
- File arvados-cwl-runner-2.5.0.dev20221114005313.tar.gz arvados-cwl-runner-2.5.0.dev20221114005313.tar.gz added
Make sure the wrapper info has git info. Which is what we really want. Also name the step for the cwl file.
Updated by Peter Amstutz about 2 years ago
19688-cwl-fast-path @ 0b39c68ee38afbfec9f7d6d082a52cc2681edbea
See note 6 for details. This is a follow up fix so that git info is copied into the wrapper.
Updated by Peter Amstutz about 2 years ago
Updated by Peter Amstutz about 2 years ago
- Status changed from In Progress to Resolved