Feature #19982
open
Ability to know when a container died because of spot instance reclamation and option to resubmit
Added by Peter Amstutz 2 months ago.
Updated about 2 months ago.
Description
New arvados-cwl-runner behavior when spot instances are enabled
- When submitting spot instance, don't retry
- Ability to detect when a container failed due to reclaimed spot instance (#19961)
- Exit code to indicate workflow failed due to spot instance
- Option to automatically re-submit as reserved instance
- Blocked by Feature #19961: Detect and log spot instance interruption notices added
- Description updated (diff)
- Category changed from CWL to Crunch
- Description updated (diff)
- Description updated (diff)
- Category changed from Crunch to CWL
- Story points changed from 2.0 to 3.0
- Target version changed from To be groomed to To be scheduled
- Related to Feature #19975: Option to re-submit container with higher memory request if previous job was killed and crunchstat shows >90% memory usage added
- Related to Feature #19974: Option to re-submit preemptible jobs to reserved nodes when previous attempt was interrupted added
Also available in: Atom
PDF