Bug #17529

[a-d-c] AWS/EC2 driver should return a RateLimitError to dispatcher if the upstream error is RequestLimitExceeded

Added by Tom Clegg 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Start date:
04/14/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Current code results in error logs like this:

{"InstanceType":"c5large.spot","PID":2231,"error":"RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 778d5b22-90e4-4a48-8f09-6946e8edcb2c","level":"error","msg":"create failed","time":"2021-03-30T13:38:38.925565910Z"}

...but the returned error does not implement the lib/cloud.RateLimitError interface, so the dispatcher doesn't back off.


Subtasks

Task #17545: Review 17529-ec2-rate-limitResolvedTom Clegg


Related issues

Related to Arvados - Bug #17561: [arvados-dispatch-cloud] inst.SetTags() and inst.Destroy() should respect rate-limiting responses from cloud providerNew

Associated revisions

Revision e96a1b2b
Added by Tom Clegg 8 months ago

Merge branch '17529-ec2-rate-limit'

fixes #17529

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 965e71f8
Added by Tom Clegg 8 months ago

Merge branch '17529-listinstances-rate-limit'

refs #17529

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg 8 months ago

  • Assigned To set to Tom Clegg

#2 Updated by Tom Clegg 8 months ago

  • Status changed from New to In Progress

#3 Updated by Ward Vandewege 8 months ago

Tom Clegg wrote:

17529-ec2-rate-limit @ ff11506c916cb2cd8abd1905e16c4d4f5ddd4240 -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2412/

Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented

This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?

Otherwise, LGTM, thanks.

#4 Updated by Tom Clegg 8 months ago

Ward Vandewege wrote:

Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented

Done with refactor.

This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?

Create and Instances() are currently the only methods whose caller (in a-d-c) will actually notice the don't-retry-until advice. Updated branch so Instances() is covered too.

We could also handle SetTags and Destroy (perhaps even reducing MaxCloudOpsPerSecond on the fly as well as logging something) ... but that would be a somewhat bigger fix, and (I suspect) not quite as important practically speaking.

That said, I noticed SetTags() wasn't even obeying MaxCloudOpsPerSecond in a-d-c, and that was trivial to fix.

17529-ec2-rate-limit @ 47cbcd47789be77f3a1c44ba605853f50c448e8a -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2419/

#5 Updated by Ward Vandewege 8 months ago

Tom Clegg wrote:

Ward Vandewege wrote:

Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented

Done with refactor.

This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?

Create and Instances() are currently the only methods whose caller (in a-d-c) will actually notice the don't-retry-until advice. Updated branch so Instances() is covered too.

We could also handle SetTags and Destroy (perhaps even reducing MaxCloudOpsPerSecond on the fly as well as logging something) ... but that would be a somewhat bigger fix, and (I suspect) not quite as important practically speaking.

That said, I noticed SetTags() wasn't even obeying MaxCloudOpsPerSecond in a-d-c, and that was trivial to fix.

17529-ec2-rate-limit @ 47cbcd47789be77f3a1c44ba605853f50c448e8a -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2419/

Cool, thanks for that additional fix in SetTags(). This LGTM; should we create a follow-on ticket to make a-d-c notice the don't-retry-until behavior for SetTags() and Destroy(), too (and the corresponding changes in the cloud drivers) ?

#6 Updated by Tom Clegg 8 months ago

  • Related to Bug #17561: [arvados-dispatch-cloud] inst.SetTags() and inst.Destroy() should respect rate-limiting responses from cloud provider added

#7 Updated by Tom Clegg 8 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

#8 Updated by Peter Amstutz 7 months ago

  • Release set to 38

Also available in: Atom PDF