Project

General

Profile

Actions

Bug #13022

closed

crunch-run broken container loop

Added by Peter Amstutz almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

https://workbench.9tee4.arvadosapi.com/container_requests/9tee4-xvhdp-vopb57pt6o9eij1#Log

Failed partway through initialization:

2018-02-01T20:05:03.402107528Z While attaching container stdout/stderr streams: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: no such file or directory
2018-02-01T20:05:03.470730548Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.9tee4-dz642-gobx4a24ihi8xpj.743593838/keep576772597]

Then it gets stuck in a loop trying to re-run the container:

2018-02-01T20:06:03.263329220Z Creating Docker container
2018-02-01T20:06:03.267277338Z While creating container: Error response from daemon: Conflict. The name "/9tee4-dz642-gobx4a24ihi8xpj" is already in use by container d2fd14fd8d99ff51fb31b489c285eb767a0309cc64d37317250ce5c0ee7b5802. You have to remove (or rename) that container to be able to reuse that name.
2018-02-01T20:06:03.345808678Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.9tee4-dz642-gobx4a24ihi8xpj.248318477/keep062669320] 

In addition, arv-mount apparently gets terminated (maybe by slurm doing killpg?) but the run directory is left in /tmp and there is a dangling mountpoint in mtab.

Looking at compute0.9tee4, I saw evidence (garbage in /tmp) that this has happened before.


Subtasks 1 (0 open1 closed)

Task #13030: Review 13022-tmp-cleanupResolvedTom Clegg02/05/2018Actions

Related issues

Related to Arvados - Bug #13095: when slurm murders a crunch2 job because it exceeds the memory limit, the container is left with a null `log`ClosedJoshua RandallActions
Actions

Also available in: Atom PDF