Project

General

Profile

Actions

Feature #19982

open

Ability to know when a container died because of spot instance reclamation and option to resubmit

Added by Peter Amstutz about 1 year ago. Updated 1 day ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
CWL
Story points:
3.0

Description

New arvados-cwl-runner behavior when spot instances are enabled

  • When submitting spot instance, don't retry
  • Ability to detect when a container failed due to reclaimed spot instance (#19961)
  • Exit code to indicate workflow failed due to spot instance
  • Option to automatically re-submit as reserved instance

Subtasks 1 (1 open0 closed)

Task #20761: ReviewNewPeter AmstutzActions

Related issues

Related to Arvados - Feature #19975: Option to re-submit container with higher memory request if previous job was killed and crunchstat shows >90% memory usageResolvedPeter Amstutz03/06/2023Actions
Related to Arvados - Feature #19974: Option to re-submit preemptible jobs to reserved nodes when previous attempt was interruptedNewActions
Related to Arvados Epics - Idea #18179: Better spot instance supportIn Progress03/01/202203/31/2024Actions
Related to Arvados - Bug #20606: Unstartable preemptible:true containers should not be reused by non-retryable preemptible:false requestsResolvedTom Clegg06/27/2023Actions
Blocked by Arvados - Feature #19961: Detect and log spot instance interruption noticesResolvedTom Clegg02/16/2023Actions
Actions #1

Updated by Peter Amstutz about 1 year ago

  • Blocked by Feature #19961: Detect and log spot instance interruption notices added
Actions #2

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 1 year ago

  • Category changed from CWL to Crunch
  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz about 1 year ago

  • Category changed from Crunch to CWL
Actions #6

Updated by Peter Amstutz about 1 year ago

  • Story points set to 2.0
Actions #7

Updated by Peter Amstutz about 1 year ago

  • Story points changed from 2.0 to 3.0
Actions #8

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Future to To be scheduled
Actions #9

Updated by Peter Amstutz about 1 year ago

  • Related to Feature #19975: Option to re-submit container with higher memory request if previous job was killed and crunchstat shows >90% memory usage added
Actions #10

Updated by Peter Amstutz about 1 year ago

  • Related to Feature #19974: Option to re-submit preemptible jobs to reserved nodes when previous attempt was interrupted added
Actions #11

Updated by Peter Amstutz 12 months ago

  • Related to Idea #18179: Better spot instance support added
Actions #12

Updated by Peter Amstutz 8 months ago

  • Target version changed from To be scheduled to Development 2023-08-02 sprint
Actions #13

Updated by Peter Amstutz 8 months ago

  • Assigned To set to Alex Coleman
Actions #14

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2023-08-02 sprint to Development 2023-08-16
Actions #15

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2023-08-16 to Development 2023-08-30
Actions #16

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-08-30 to Development 2023-09-13 sprint
Actions #17

Updated by Brett Smith 7 months ago

  • Related to Bug #20606: Unstartable preemptible:true containers should not be reused by non-retryable preemptible:false requests added
Actions #18

Updated by Brett Smith 7 months ago

We should consider undoing or narrowing the reuse changes we made in #20606 after we implement this. If Arvados gets better about retrying, then odds go up that the reuse narrowing is more likely to be wasteful than helpful.

Actions #19

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-09-13 sprint to Development 2023-09-27 sprint
Actions #20

Updated by Peter Amstutz 7 months ago

  • Status changed from New to In Progress
Actions #21

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Actions #22

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Actions #23

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2023-10-25 sprint to Development 2023-11-08 sprint
Actions #24

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Actions #25

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2023-11-29 sprint to Development 2024-01-03 sprint
Actions #26

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Actions #27

Updated by Peter Amstutz 2 months ago

  • Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Actions #28

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Actions #29

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Actions #30

Updated by Peter Amstutz 29 days ago

  • Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Actions #31

Updated by Peter Amstutz 15 days ago

  • Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Actions #32

Updated by Peter Amstutz 1 day ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Actions

Also available in: Atom PDF