Project

General

Profile

Actions

Bug #20533

closed

Better handling of request surges when canceling a large workflow

Added by Peter Amstutz 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
-
Story points:
-
Release relationship:
Auto

Description

Specific test case: running a workflow with 100s of containers and then canceling them all at once leads to a massive surge of requests to the API server as all the containers finalize all at once.

Want to test ways that we can mitigate this traffic surge so that:

  1. all the containers finalize without fatal 503 errors (#20540, #20541)
  2. the workbench remains responsive (at least for GET requests during this time)
    1. evaluate configuration changes
    2. load balancing #20539
    3. controller request prioritization #20602
    4. Send out cancellations at a slower rate than whatever it's doing right now

Related issues

Related to Arvados Epics - Idea #20599: Scaling to 1000s of concurrent containersResolved06/01/202303/31/2024Actions
Related to Arvados - Idea #20602: Prioritize requests made by workbench 2ResolvedTom Clegg06/08/2023Actions
Actions

Also available in: Atom PDF