Project

General

Profile

Bug #13022

Updated by Peter Amstutz about 6 years ago

https://workbench.9tee4.arvadosapi.com/container_requests/9tee4-xvhdp-vopb57pt6o9eij1#Log 

 Failed partway through initialization: 

 <pre> 
 2018-02-01T20:05:03.402107528Z While attaching container stdout/stderr streams: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: no such file or directory 
 2018-02-01T20:05:03.470730548Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.9tee4-dz642-gobx4a24ihi8xpj.743593838/keep576772597] 
 </pre> 

 Then it gets stuck in a loop trying to re-run the container: 

 <pre> 
 2018-02-01T20:06:03.263329220Z Creating Docker container 
 2018-02-01T20:06:03.267277338Z While creating container: Error response from daemon: Conflict. The name "/9tee4-dz642-gobx4a24ihi8xpj" is already in use by container d2fd14fd8d99ff51fb31b489c285eb767a0309cc64d37317250ce5c0ee7b5802. You have to remove (or rename) that container to be able to reuse that name. 
 2018-02-01T20:06:03.345808678Z Running [arv-mount --unmount-timeout=8 --unmount /tmp/crunch-run.9tee4-dz642-gobx4a24ihi8xpj.248318477/keep062669320]  
 </pre> 

 In addition, arv-mount apparently gets terminated (maybe by slurm doing killpg?) but the run directory is left in /tmp and there is a dangling mountpoint in mtab. 

 Looking at compute0.9tee4, I saw evidence (garbage in /tmp) that this has happened before. 

Back