Project

General

Profile

Actions

Idea #18179

open

Better spot instance support

Added by Peter Amstutz about 3 years ago. Updated 7 months ago.

Status:
In Progress
Priority:
Normal
Assigned To:
-
Target version:
Start date:
03/01/2022
Due date:
06/30/2024 (about 5 months late)
Story points:
-
Release:
Release relationship:
Auto

Description

  • Currently sitewide on/off choice, can't choose per-workflow
  • Have to duplicate instance types in the config (obnoxious) (see #18596)
  • Records the wrong price (uses price from instance type config not actual information from the cloud)
  • Scheduling choices are too narrow, should be able to request different node types when the node you want isn't available
    • Could we query spot prices on the fly to make scheduling decisions
    • Try bigger instance types but only bid the spot price for the smallest node type
    • Should eventually escalate to an on-demand instance if spot instance isn't available
  • User should be able to communicate cost tolerance
  • Want to try other availability zones, but requires feature of Keepstore running on compute nodes (#16516)
  • Need better way to handle spot instance shutdown
    • Maybe just always retry on a regular cost node
  • Consider shutting down spot instances after a job because there is a timer?
    • Need to research this more
  • Can the VM be frozen / restored?

Related issues

Related to Arvados - Feature #18180: Ability to control use of spot instances on a per-workflow and step levelResolvedPeter Amstutz03/17/2022Actions
Related to Arvados - Feature #18181: Ability to specify a % of compute instance price that user is willing to go over from cheapestNewActions
Related to Arvados - Feature #17695: [costanalyzer] make an accurate report for spot instances on AWSNewActions
Related to Arvados - Bug #18101: [a-d-c] [AWS] add option to spin up (spot) instances in more/all availability zones in the regionResolvedActions
Related to Arvados - Feature #18596: Config option to enable preemptible variants of all instance typesResolvedTom Clegg03/21/2022Actions
Related to Arvados - Bug #18562: [api] should not change the preemptible flag across the boardResolvedTom Clegg12/23/2021Actions
Related to Arvados - Feature #19961: Detect and log spot instance interruption noticesResolvedTom Clegg02/16/2023Actions
Related to Arvados - Feature #19320: Get actual instance price information by calling AWS APIsResolvedTom Clegg01/23/2023Actions
Related to Arvados - Feature #19982: Ability to know when a container died because of spot instance reclamation and option to resubmitResolvedPeter AmstutzActions
Related to Arvados - Feature #16316: a-c-r handles resource range requests (especially CPU) and adjusts requests based on what is in InstanceTypes listIn ProgressActions
Related to Arvados - Feature #19675: Panel that lists configured instance typesResolvedStephen Smith12/05/2023Actions
Related to Arvados - Feature #20979: Research spot instance retry strategiesNewSarah ZaranekActions
Related to Arvados - Feature #20978: Support multiple candidate instance types to assign containersResolvedTom Clegg10/31/2023Actions
Related to Arvados - Feature #21460: spot instance reclamation is triggers "at capacity" cooloffNewActions
Blocked by Arvados - Feature #18205: [api] [cloud] add compute instance price to container recordResolvedTom Clegg08/08/2022Actions
Actions #1

Updated by Peter Amstutz about 3 years ago

  • Start date set to 11/01/2021
  • Due date set to 03/31/2022
Actions #2

Updated by Peter Amstutz about 3 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 3 years ago

  • Related to Feature #18180: Ability to control use of spot instances on a per-workflow and step level added
Actions #4

Updated by Peter Amstutz about 3 years ago

  • Related to Feature #18181: Ability to specify a % of compute instance price that user is willing to go over from cheapest added
Actions #5

Updated by Ward Vandewege about 3 years ago

  • Description updated (diff)
Actions #6

Updated by Ward Vandewege about 3 years ago

  • Related to Feature #17695: [costanalyzer] make an accurate report for spot instances on AWS added
Actions #7

Updated by Ward Vandewege about 3 years ago

  • Blocked by Feature #18205: [api] [cloud] add compute instance price to container record added
Actions #8

Updated by Peter Amstutz about 3 years ago

  • Start date changed from 11/01/2021 to 01/01/2022
Actions #9

Updated by Peter Amstutz about 3 years ago

  • Start date changed from 01/01/2022 to 05/01/2022
  • Due date changed from 03/31/2022 to 07/31/2022
Actions #10

Updated by Peter Amstutz almost 3 years ago

  • Related to Bug #18101: [a-d-c] [AWS] add option to spin up (spot) instances in more/all availability zones in the region added
Actions #11

Updated by Ward Vandewege almost 3 years ago

  • Related to Feature #18596: Config option to enable preemptible variants of all instance types added
Actions #12

Updated by Ward Vandewege almost 3 years ago

  • Related to Bug #18562: [api] should not change the preemptible flag across the board added
Actions #13

Updated by Ward Vandewege almost 3 years ago

  • Description updated (diff)
Actions #14

Updated by Peter Amstutz over 2 years ago

  • Start date changed from 05/01/2022 to 03/01/2022
Actions #15

Updated by Peter Amstutz over 2 years ago

  • Due date changed from 07/31/2022 to 08/31/2022
Actions #16

Updated by Peter Amstutz about 2 years ago

  • Status changed from New to In Progress
  • Due date changed from 08/31/2022 to 09/30/2022
Actions #17

Updated by Peter Amstutz about 2 years ago

  • Due date changed from 09/30/2022 to 11/30/2022
Actions #18

Updated by Peter Amstutz about 2 years ago

  • Start date changed from 03/01/2022 to 01/01/2023
  • Due date changed from 11/30/2022 to 04/30/2023
Actions #19

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 01/01/2023 to 09/01/2022
Actions #20

Updated by Peter Amstutz almost 2 years ago

  • Start date changed from 09/01/2022 to 03/01/2022
Actions #21

Updated by Tom Clegg almost 2 years ago

  • Related to Feature #19961: Detect and log spot instance interruption notices added
Actions #22

Updated by Tom Clegg almost 2 years ago

  • Related to Feature #19320: Get actual instance price information by calling AWS APIs added
Actions #23

Updated by Peter Amstutz over 1 year ago

  • Related to Feature #19982: Ability to know when a container died because of spot instance reclamation and option to resubmit added
Actions #24

Updated by Peter Amstutz over 1 year ago

  • Due date changed from 04/30/2023 to 05/31/2023
Actions #25

Updated by Peter Amstutz over 1 year ago

  • Due date changed from 05/31/2023 to 07/31/2023
Actions #26

Updated by Peter Amstutz over 1 year ago

  • Related to Feature #16316: a-c-r handles resource range requests (especially CPU) and adjusts requests based on what is in InstanceTypes list added
Actions #27

Updated by Peter Amstutz over 1 year ago

  • Related to Feature #19675: Panel that lists configured instance types added
Actions #28

Updated by Peter Amstutz over 1 year ago

  • Due date changed from 07/31/2023 to 09/30/2023
Actions #29

Updated by Peter Amstutz about 1 year ago

  • Related to Feature #20979: Research spot instance retry strategies added
Actions #30

Updated by Peter Amstutz about 1 year ago

  • Due date changed from 09/30/2023 to 12/31/2023
Actions #31

Updated by Peter Amstutz about 1 year ago

  • Related to Feature #20978: Support multiple candidate instance types to assign containers added
Actions #32

Updated by Peter Amstutz 11 months ago

  • Due date changed from 12/31/2023 to 03/31/2024
Actions #33

Updated by Peter Amstutz 10 months ago

  • Related to Feature #21460: spot instance reclamation is triggers "at capacity" cooloff added
Actions #34

Updated by Peter Amstutz 7 months ago

  • Due date changed from 03/31/2024 to 06/30/2024
Actions #35

Updated by Peter Amstutz 7 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF