Project

General

Profile

Actions

Idea #4599

closed

[Crunch] crunch-job should not retry tasks after its SLURM allocation is revoked

Added by Brett Smith over 9 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
-
Start date:
Due date:
Story points:
1.0

Description

When a job is assigned to a single node, and it fails, SLURM revokes the job allocation. See #4410 for example logs. crunch-job correctly detects the node failure, but it then goes to retry the task, which is pointless without the job allocation. The job should fail immediately once the allocation is revoked.


Related issues

Related to Arvados - Bug #4410: [Crunch] crunch-job should exit tempfail when a SLURM node failsResolvedBrett Smith11/04/2014Actions
Actions

Also available in: Atom PDF