Actions
Bug #17776
closed[a-d-c] [ec2] when InsufficientInstanceCapacity is returned, we should throttle node creation.
Story points:
-
Release:
Release relationship:
Auto
Updated by Ward Vandewege over 3 years ago
- Target version changed from To Be Groomed to 2021-06-23 sprint
- Assigned To set to Ward Vandewege
- Status changed from New to In Progress
A very basic approach at 66d3cb88d07eed627903b6db0b1cffb7491d4e34 on branch 17776-more-throttling
Updated by Ward Vandewege over 3 years ago
- Related to Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instances added
Updated by Ward Vandewege over 3 years ago
- Related to Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attempts added
Updated by Tom Clegg over 3 years ago
For detecting the error:
- I don't think we want to export IsErrorCapacity.
- The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do
var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}
- These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.
Updated by Ward Vandewege over 3 years ago
Tom Clegg wrote:
For detecting the error:
- I don't think we want to export IsErrorCapacity.
- The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do
var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}
Yes, all fixed, thanks.
For reporting it back to dispatcher:
- These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.
Thanks! I've updated the branch accordingly. I've also added a basic test for wrapError in the ec2 driver. See 6bb5a84a53e5810e96e56e41cc751d4ebc054580 on branch 17776-more-throttling.
Tests in developer-run-tests: #2527
Updated by Ward Vandewege over 3 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|278b10cec053b16cba91c41ed2b978b9449230f7.
Actions