Project

General

Profile

Actions

Bug #14596

closed

[crunch-dispatch-slurm] Abandoned container starts again instead of being cancelled

Added by Tom Clegg almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release relationship:
Auto

Description

This seems to happen sometimes:
  1. Container is queued
  2. c-d-slurm submits a slurm job to run the container
  3. Something goes wrong -- perhaps the crunch-run process dies, or the slurm node goes down?
  4. c-d-slurm submits another slurm job, and the same container runs on a different node (we aren't certain c-d-slurm resubmits)

Subtasks 1 (0 open1 closed)

Task #14607: Review 14596-check-container-lockedResolvedTom Clegg12/13/2018Actions
Actions

Also available in: Atom PDF