Project

General

Profile

Actions

Bug #17776

closed

[a-d-c] [ec2] when InsufficientInstanceCapacity is returned, we should throttle node creation.

Added by Ward Vandewege over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Subtasks 1 (0 open1 closed)

Task #17784: review 17776-more-throttlingResolvedWard Vandewege06/10/2021Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instancesResolvedWard VandewegeActions
Related to Arvados - Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attemptsResolvedWard VandewegeActions
Actions #1

Updated by Ward Vandewege over 3 years ago

  • Target version changed from To Be Groomed to 2021-06-23 sprint
  • Assigned To set to Ward Vandewege
  • Status changed from New to In Progress

A very basic approach at 66d3cb88d07eed627903b6db0b1cffb7491d4e34 on branch 17776-more-throttling

Actions #2

Updated by Ward Vandewege over 3 years ago

  • Related to Bug #17777: [a-d-c] [ec2] MaxSpotInstanceCountExceeded should throttle creation attempts for preemptible instances added
Actions #3

Updated by Ward Vandewege over 3 years ago

  • Related to Bug #17783: [a-d-c] [ec2] VcpuLimitExceeded should throttle node creation attempts added
Actions #4

Updated by Tom Clegg over 3 years ago

For detecting the error:
  • I don't think we want to export IsErrorCapacity.
  • The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}
For reporting it back to dispatcher:
  • These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.
Actions #5

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

For detecting the error:
  • I don't think we want to export IsErrorCapacity.
  • The extra isCodeCapacity func seems needlessly verbose all for the sake of saving a few bytes of an unchanging map. Could just do var isCodeCapacity = map[string]bool{"InsufficientInstanceCapacity": true, ...}

Yes, all fixed, thanks.

For reporting it back to dispatcher:
  • These errors seem more like quota errors than API request limit errors. We have a different interface for quota errors (IsQuotaError() bool), the Azure driver has an example. That way the dispatcher can shut down idle nodes in an effort to free up capacity.

Thanks! I've updated the branch accordingly. I've also added a basic test for wrapError in the ec2 driver. See 6bb5a84a53e5810e96e56e41cc751d4ebc054580 on branch 17776-more-throttling.

Tests in developer-run-tests: #2527

Actions #6

Updated by Tom Clegg over 3 years ago

LGTM, thanks!

Actions #7

Updated by Ward Vandewege over 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #8

Updated by Ward Vandewege over 3 years ago

  • Release set to 39
Actions

Also available in: Atom PDF