Project

General

Profile

Actions

Bug #21194

open

Mass container cancellations slam the API server

Added by Peter Amstutz 6 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
-

Description

Observed: when someone terminates a large workflow with 100s or 1000s processes, this results in all of processes needing to check in with the API server at once. This creates a flood of activity which can redline the API server for multiple minutes, and even block workbench access.

Request prioritization should mean that workbench API calls continue to work.

However, we've observed user scripts sometimes getting errors back during this period.

My idea is to make API calls from crunch-run submitted as "low priority" so that when the request queue fills up, the crunch-run requests are pushed out and retried, while other requests not coming from the cancelled crunch-run processes stand in the normal line.

Actions #1

Updated by Peter Amstutz 6 months ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions

Also available in: Atom PDF