Project

General

Profile

Actions

Bug #20667

closed

atQuota should dynamically lower maxSupervisors

Added by Peter Amstutz 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Story points:
-
Release relationship:
Auto

Description

If the initial value of maxInstances is large, and the dispatcher becomes cloud quota limited (as opposed to getting 503 errors from controller) it does not recompute maxSupervisors based on the actual number of instances that can be run. This can result in inefficient usage or starvation because maxSupervisors does not reflect the actual limitations of the cluster.

It would also be nice to have a metric that represents when we are not able to start nodes because we have gotten a capacity error from the cloud.

My test case was running out of quota at 250 nodes (InsufficientFreeAddressesInSubnet) and this wasn't apparent from the dashboard, I had to dig into the syslog, and even then it was hard to find.


Subtasks 1 (0 open1 closed)

Task #20675: Review 20667-maxsuper-atquotaResolvedPeter Amstutz06/26/2023Actions
Actions

Also available in: Atom PDF