Project

General

Profile

Bug #20533

Updated by Peter Amstutz 10 months ago

Specific test case: running a workflow with 100s of containers and then canceling them all at once leads to a massive surge of requests to the API server as all the containers finalize all at once. 

 Want to test ways that we can mitigate this traffic surge so that: 

 # all the containers finalize without fatal 503 errors (#20540, #20541) 
 # the workbench remains responsive (at least for GET requests during this time) 
 ## evaluate configuration changes 
 ## load balancing #20539 
 ## controller request prioritization discussed below 

 previous text: 

 When the API server is snowed in by requests (I believe these are updates from running containers,    but need to collect more data to be sure) it should still be accessible by workbench, at least read-only.    We should consider doing something like the existing request limiter for logging updates, but apply it to all PUT and POST requests, so that GET requests can be processed and Workbench doesn't completely fall over (which looks bad). 

Back