Bug #17776

[a-d-c] [ec2] when InsufficientInstanceCapacity is returned, we should throttle node creation.

Added by Ward Vandewege 4 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/10/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Subtasks

Task #17784: review 17776-more-throttlingResolvedWard Vandewege


Related issues

Related to Arvados - Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instancesResolved

Related to Arvados - Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attemptsResolved

Associated revisions

Revision 278b10ce
Added by Ward Vandewege 4 months ago

Merge branch '17776-more-throttling'

closes #17776
closes #17777
closes #17783

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Ward Vandewege 4 months ago

  • Target version changed from To Be Groomed to 2021-06-23 sprint
  • Assigned To set to Ward Vandewege
  • Status changed from New to In Progress

A very basic approach at 66d3cb88d07eed627903b6db0b1cffb7491d4e34 on branch 17776-more-throttling

#2 Updated by Ward Vandewege 4 months ago

  • Related to Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instances added

#3 Updated by Ward Vandewege 4 months ago

  • Related to Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attempts added

#4 Updated by Tom Clegg 4 months ago

For detecting the error:
  • I don't think we want to export IsErrorCapacity.
  • The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}
For reporting it back to dispatcher:
  • These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.

#5 Updated by Ward Vandewege 4 months ago

Tom Clegg wrote:

For detecting the error:
  • I don't think we want to export IsErrorCapacity.
  • The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}

Yes, all fixed, thanks.

For reporting it back to dispatcher:
  • These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.

Thanks! I've updated the branch accordingly. I've also added a basic test for wrapError in the ec2 driver. See 6bb5a84a53e5810e96e56e41cc751d4ebc054580 on branch 17776-more-throttling.

Tests in https://ci.arvados.org/view/Developer/job/developer-run-tests/2527/

#6 Updated by Tom Clegg 4 months ago

LGTM, thanks!

#7 Updated by Ward Vandewege 4 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

#8 Updated by Ward Vandewege 4 months ago

  • Release set to 39

Also available in: Atom PDF