Project

General

Profile

Actions

Bug #20894

closed

more config defaults

Added by Peter Amstutz 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Story points:
-
Release relationship:
Auto

Description

I just found an interesting cloud dispatch bug... we start with maxConcurrency at 16 and SupervisorFraction at .30. If each workflow only starts one subprocess, we use .60 of maxConcurrency. problem is, maxSupervisors is based on maxConcurrency, so even though there is a big backlog of supervisor processes that want to run, they don't get scheduled, but because we're only using 60% capacity, it doesn't try to raise maxConcurrency either

I think the answer is that the default value of SupervisorFraction has to be 50%

and/or the default value of InitialQuotaEstimate should be 0 (which sets it to match MaxInstances)

I didn't see this on the scale cluster test, but I had already adjusted SupervisorFraction to 0.45 and InitialQuotaEstimate to 400, and the synthetic workflow does have a phase where it submits two parallel jobs, which would push the queue up to the maximum

so I wonder, if it didn't have a parallel job phase, it would have been stuck at 400


Subtasks 1 (0 open1 closed)

Task #20895: Review 20894-instances-defaultResolvedPeter Amstutz08/24/2023Actions
Actions

Also available in: Atom PDF