Bug #20894
closedmore config defaults
Description
I just found an interesting cloud dispatch bug... we start with maxConcurrency at 16 and SupervisorFraction at .30. If each workflow only starts one subprocess, we use .60 of maxConcurrency. problem is, maxSupervisors is based on maxConcurrency, so even though there is a big backlog of supervisor processes that want to run, they don't get scheduled, but because we're only using 60% capacity, it doesn't try to raise maxConcurrency either
I think the answer is that the default value of SupervisorFraction has to be 50%
and/or the default value of InitialQuotaEstimate should be 0 (which sets it to match MaxInstances)
I didn't see this on the scale cluster test, but I had already adjusted SupervisorFraction to 0.45 and InitialQuotaEstimate to 400, and the synthetic workflow does have a phase where it submits two parallel jobs, which would push the queue up to the maximum
so I wonder, if it didn't have a parallel job phase, it would have been stuck at 400
Updated by Peter Amstutz over 1 year ago
20894-instances-default @ 0787f110e5ed61b358901fe38796e7ef0008deac
Updated by Peter Amstutz over 1 year ago
20894-instances-default @ df5350ac0dcd52a5c8d6d0d76da49461e30e6ba3
Updated by Peter Amstutz over 1 year ago
20894-instances-default @ 15e554b5da9d4da9447e547abd536c53be069a21
Updated by Peter Amstutz over 1 year ago
- Status changed from In Progress to Resolved