Project

General

Profile

Actions

Feature #19982

open

Ability to know when a container died because of spot instance reclamation and option to resubmit

Added by Peter Amstutz about 1 year ago. Updated about 10 hours ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
CWL
Story points:
3.0

Description

New arvados-cwl-runner behavior when spot instances are enabled

  • When submitting spot instance, don't retry
  • Ability to detect when a container failed due to reclaimed spot instance (#19961)
  • Exit code to indicate workflow failed due to spot instance
  • Option to automatically re-submit as reserved instance

Subtasks 1 (1 open0 closed)

Task #20761: ReviewNewPeter AmstutzActions

Related issues

Related to Arvados - Feature #19975: Option to re-submit container with higher memory request if previous job was killed and crunchstat shows >90% memory usageResolvedPeter Amstutz03/06/2023Actions
Related to Arvados - Feature #19974: Option to re-submit preemptible jobs to reserved nodes when previous attempt was interruptedNewActions
Related to Arvados Epics - Idea #18179: Better spot instance supportIn Progress03/01/202206/30/2024Actions
Related to Arvados - Bug #20606: Unstartable preemptible:true containers should not be reused by non-retryable preemptible:false requestsResolvedTom Clegg06/27/2023Actions
Blocked by Arvados - Feature #19961: Detect and log spot instance interruption noticesResolvedTom Clegg02/16/2023Actions
Actions

Also available in: Atom PDF