Actions
Bug #20533
closedBetter handling of request surges when canceling a large workflow
Story points:
-
Release:
Release relationship:
Auto
Description
Specific test case: running a workflow with 100s of containers and then canceling them all at once leads to a massive surge of requests to the API server as all the containers finalize all at once.
Want to test ways that we can mitigate this traffic surge so that:
- all the containers finalize without fatal 503 errors (#20540, #20541)
- the workbench remains responsive (at least for GET requests during this time)
- evaluate configuration changes
- load balancing #20539
- controller request prioritization #20602
- Send out cancellations at a slower rate than whatever it's doing right now
Related issues
Updated by Peter Amstutz over 1 year ago
- Target version changed from Future to To be scheduled
Updated by Peter Amstutz over 1 year ago
- Target version changed from To be scheduled to Development 2023-06-07
Updated by Peter Amstutz over 1 year ago
- Description updated (diff)
- Subject changed from Better handling of request surges to Better handling of request surges when canceling a large workflow
Updated by Peter Amstutz over 1 year ago
- Related to Idea #20599: Scaling to 1000s of concurrent containers added
Updated by Peter Amstutz over 1 year ago
- Related to Idea #20602: Prioritize requests made by workbench 2 added
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-06-07 to Future
Updated by Peter Amstutz over 1 year ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 1 year ago
- Status changed from In Progress to Feedback
Updated by Peter Amstutz over 1 year ago
I ran a test with 250 containers and hitting cancel. The request queue does immediately fill up with requests as the containers try to finalize, but
- Workbench remains responsive (very important)
- I believe all the crunch-run processes retry and terminate gracefully (but a bug I thought I fixed might still be a think: https://dev.arvados.org/issues/20614#note-14)
Updated by Peter Amstutz over 1 year ago
- Release set to 66
- Target version deleted (
Future) - Status changed from Feedback to Resolved
Actions