Project

General

Profile

Actions

Feature #19982

open

Ability to know when a container died because of spot instance reclamation and option to resubmit

Added by Peter Amstutz 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
CWL
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
3.0

Description

New arvados-cwl-runner behavior when spot instances are enabled

  • When submitting spot instance, don't retry
  • Ability to detect when a container failed due to reclaimed spot instance (#19961)
  • Exit code to indicate workflow failed due to spot instance
  • Option to automatically re-submit as reserved instance

Related issues

Related to Arvados - Feature #19975: Option to re-submit container with higher memory request if previous job was killed and crunchstat shows >90% memory usageResolvedPeter Amstutz03/06/2023

Actions
Related to Arvados - Feature #19974: Option to re-submit preemptible jobs to reserved nodes when previous attempt was interruptedNew

Actions
Blocked by Arvados - Feature #19961: Detect and log spot instance interruption noticesResolvedTom Clegg02/16/2023

Actions
Actions

Also available in: Atom PDF