Project

General

Profile

Actions

Feature #19982

open

Ability to know when a container died because of spot instance reclamation and option to resubmit

Added by Peter Amstutz about 1 year ago. Updated 10 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
CWL
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
3.0

Description

New arvados-cwl-runner behavior when spot instances are enabled

  • When submitting spot instance, don't retry
  • Ability to detect when a container failed due to reclaimed spot instance (#19961)
  • Exit code to indicate workflow failed due to spot instance
  • Option to automatically re-submit as reserved instance

Subtasks 1 (1 open0 closed)

Task #20761: ReviewNewPeter Amstutz

Actions

Related issues

Related to Arvados - Feature #19975: Option to re-submit container with higher memory request if previous job was killed and crunchstat shows >90% memory usageResolvedPeter Amstutz03/06/2023

Actions
Related to Arvados - Feature #19974: Option to re-submit preemptible jobs to reserved nodes when previous attempt was interruptedNew

Actions
Related to Arvados Epics - Story #18179: Better spot instance supportIn Progress03/01/202203/31/2024

Actions
Related to Arvados - Bug #20606: Unstartable preemptible:true containers should not be reused by non-retryable preemptible:false requestsResolvedTom Clegg06/27/2023

Actions
Blocked by Arvados - Feature #19961: Detect and log spot instance interruption noticesResolvedTom Clegg02/16/2023

Actions
Actions

Also available in: Atom PDF