Idea #18179
closed
Better spot instance support
Added by Peter Amstutz over 3 years ago.
Updated 2 days ago.
Release relationship:
Auto
Description
- Currently sitewide on/off choice, can't choose per-workflow
- Have to duplicate instance types in the config (obnoxious) (see #18596)
- Records the wrong price (uses price from instance type config not actual information from the cloud)
- Scheduling choices are too narrow, should be able to request different node types when the node you want isn't available
- Could we query spot prices on the fly to make scheduling decisions
- Try bigger instance types but only bid the spot price for the smallest node type
- Should eventually escalate to an on-demand instance if spot instance isn't available
- User should be able to communicate cost tolerance
- Want to try other availability zones, but requires feature of Keepstore running on compute nodes (#16516)
- Need better way to handle spot instance shutdown
- Maybe just always retry on a regular cost node
- Consider shutting down spot instances after a job because there is a timer?
- Need to research this more
- Can the VM be frozen / restored?
- Start date set to 11/01/2021
- Due date set to 03/31/2022
- Description updated (diff)
- Related to Feature #18180: Ability to control use of spot instances on a per-workflow and step level added
- Related to Feature #18181: Ability to specify a % of compute instance price that user is willing to go over from cheapest added
- Description updated (diff)
- Related to Feature #17695: [costanalyzer] make an accurate report for spot instances on AWS added
- Blocked by Feature #18205: [api] [cloud] add compute instance price to container record added
- Start date changed from 11/01/2021 to 01/01/2022
- Start date changed from 01/01/2022 to 05/01/2022
- Due date changed from 03/31/2022 to 07/31/2022
- Related to Bug #18101: [a-d-c] [AWS] add option to spin up (spot) instances in more/all availability zones in the region added
- Related to Feature #18596: Config option to enable preemptible variants of all instance types added
- Related to Bug #18562: [api] should not change the preemptible flag across the board added
- Description updated (diff)
- Start date changed from 05/01/2022 to 03/01/2022
- Due date changed from 07/31/2022 to 08/31/2022
- Status changed from New to In Progress
- Due date changed from 08/31/2022 to 09/30/2022
- Due date changed from 09/30/2022 to 11/30/2022
- Start date changed from 03/01/2022 to 01/01/2023
- Due date changed from 11/30/2022 to 04/30/2023
- Start date changed from 01/01/2023 to 09/01/2022
- Start date changed from 09/01/2022 to 03/01/2022
- Related to Feature #19961: Detect and log spot instance interruption notices added
- Related to Feature #19320: Get actual instance price information by calling AWS APIs added
- Related to Feature #19982: Ability to know when a container died because of spot instance reclamation and option to resubmit added
- Due date changed from 04/30/2023 to 05/31/2023
- Due date changed from 05/31/2023 to 07/31/2023
- Related to Feature #16316: a-c-r handles resource range requests (especially CPU) and adjusts requests based on what is in InstanceTypes list added
- Related to Feature #19675: Panel that lists configured instance types added
- Due date changed from 07/31/2023 to 09/30/2023
- Related to Feature #20979: Research spot instance retry strategies added
- Due date changed from 09/30/2023 to 12/31/2023
- Related to Feature #20978: Support multiple candidate instance types to assign containers added
- Due date changed from 12/31/2023 to 03/31/2024
- Related to Feature #21460: spot instance reclamation is triggers "at capacity" cooloff added
- Due date changed from 03/31/2024 to 06/30/2024
- Target version set to Future
- Status changed from In Progress to Resolved
I actually think this can be considered resolved, we have users using spot instances successfully in production for a long time now.
Also available in: Atom
PDF