Feature #21460
openspot instance reclamation is triggers "at capacity" cooloff
Description
When a spot instance is reclaimed, we don't want to treat it as a instance failure and retry immediately.
Instead, we want to wait a little bit (should be configurable, maybe just add capacityErrorTTL to the config file?) before trying to acquire that instance type again.
When container has indicated that it cancelled because its instance was reclaimed, the spot instance type it was running on should be marked as "at capacity".
If an attempt to allocate a spot instance fails with a "can't get spot instance" error we should also set "at capacity" state.
notes:
This is what it does when a preemption notice happens. The documentation suggests checking for the existence of the preemptionNotice
key in the container's runtime_status
.
runner.updateRuntimeStatus(arvadosclient.Dict{ "warning": "preemption notice", "warningDetail": text, "preemptionNotice": text, })
Related issues
Updated by Peter Amstutz 10 months ago
- Subject changed from Put a temporary hold on an instance type when spot instance reclamation is detected to spot instance reclamation is triggers "at capacity" cooloff
Updated by Peter Amstutz 10 months ago
- Related to Idea #18179: Better spot instance support added
Updated by Peter Amstutz 10 months ago
- Target version changed from Future to Development 2024-03-27 sprint
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-03-27 sprint to Development 2024-04-24 sprint
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-06-05 sprint
Updated by Peter Amstutz 7 months ago
- Target version changed from Development 2024-06-05 sprint to Future