Actions
Bug #21194
openMass container cancellations slam the API server
Story points:
-
Description
Observed: when someone terminates a large workflow with 100s or 1000s processes, this results in all of processes needing to check in with the API server at once. This creates a flood of activity which can redline the API server for multiple minutes, and even block workbench access.
Request prioritization should mean that workbench API calls continue to work.
However, we've observed user scripts sometimes getting errors back during this period.
My idea is to make API calls from crunch-run submitted as "low priority" so that when the request queue fills up, the crunch-run requests are pushed out and retried, while other requests not coming from the cancelled crunch-run processes stand in the normal line.
Actions