Bug #17529
closed
[a-d-c] AWS/EC2 driver should return a RateLimitError to dispatcher if the upstream error is RequestLimitExceeded
Added by Tom Clegg almost 4 years ago.
Updated over 3 years ago.
Release relationship:
Auto
Description
Current code results in error logs like this:
{"InstanceType":"c5large.spot","PID":2231,"error":"RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 778d5b22-90e4-4a48-8f09-6946e8edcb2c","level":"error","msg":"create failed","time":"2021-03-30T13:38:38.925565910Z"}
...but the returned error does not implement the lib/cloud.RateLimitError interface, so the dispatcher doesn't back off.
Related issues
1 (1 open — 0 closed)
- Assigned To set to Tom Clegg
- Status changed from New to In Progress
Tom Clegg wrote:
17529-ec2-rate-limit @ ff11506c916cb2cd8abd1905e16c4d4f5ddd4240 -- developer-run-tests: #2412
Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented
This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?
Otherwise, LGTM, thanks.
Ward Vandewege wrote:
Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented
Done with refactor.
This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?
Create and Instances() are currently the only methods whose caller (in a-d-c) will actually notice the don't-retry-until advice. Updated branch so Instances() is covered too.
We could also handle SetTags and Destroy (perhaps even reducing MaxCloudOpsPerSecond on the fly as well as logging something) ... but that would be a somewhat bigger fix, and (I suspect) not quite as important practically speaking.
That said, I noticed SetTags() wasn't even obeying MaxCloudOpsPerSecond in a-d-c, and that was trivial to fix.
17529-ec2-rate-limit @ 47cbcd47789be77f3a1c44ba605853f50c448e8a -- developer-run-tests: #2419
Tom Clegg wrote:
Ward Vandewege wrote:
Nit: in `func newEC2InstanceSet` the last "else" can be dropped and the block outdented
Done with refactor.
This looks good for the "RunInstances" call; should we also add this for the other function interfaces that ec2Interface exposes?
Create and Instances() are currently the only methods whose caller (in a-d-c) will actually notice the don't-retry-until advice. Updated branch so Instances() is covered too.
We could also handle SetTags and Destroy (perhaps even reducing MaxCloudOpsPerSecond on the fly as well as logging something) ... but that would be a somewhat bigger fix, and (I suspect) not quite as important practically speaking.
That said, I noticed SetTags() wasn't even obeying MaxCloudOpsPerSecond in a-d-c, and that was trivial to fix.
17529-ec2-rate-limit @ 47cbcd47789be77f3a1c44ba605853f50c448e8a -- developer-run-tests: #2419
Cool, thanks for that additional fix in SetTags(). This LGTM; should we create a follow-on ticket to make a-d-c notice the don't-retry-until behavior for SetTags() and Destroy(), too (and the corresponding changes in the cloud drivers) ?
- Related to Bug #17561: [arvados-dispatch-cloud] inst.SetTags() and inst.Destroy() should respect rate-limiting responses from cloud provider added
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Also available in: Atom
PDF