Story #4599

[Crunch] crunch-job should not retry tasks after its SLURM allocation is revoked

Added by Brett Smith almost 3 years ago. Updated 5 months ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Crunch
Target version:-
Story points1.0
Velocity based estimate-

Description

When a job is assigned to a single node, and it fails, SLURM revokes the job allocation. See #4410 for example logs. crunch-job correctly detects the node failure, but it then goes to retry the task, which is pointless without the job allocation. The job should fail immediately once the allocation is revoked.


Related issues

Related to Arvados - Bug #4410: [Crunch] crunch-job should exit tempfail when a SLURM nod... Resolved 11/04/2014

History

#1 Updated by Brett Smith almost 3 years ago

  • Subject changed from crunch-job should not retry tasks after its SLURM allocation is revoked to [Crunch] crunch-job should not retry tasks after its SLURM allocation is revoked
  • Description updated (diff)
  • Category set to Crunch
  • Story points set to 1.0

#2 Updated by Tom Clegg 6 months ago

  • Status changed from New to Closed

#3 Updated by Tom Clegg 5 months ago

  • Target version deleted (Arvados Future Sprints)

Also available in: Atom PDF