Project

General

Profile

Bug #7444

Updated by Brett Smith over 8 years ago

We use @docker run --rm@ to ensure that Docker containers are removed after tasks are finished, to prevent compute Compute nodes from filling up with unused volumes.    However, "@docker run --rm@ is handled by on tb05z were taken out of rotation because the Docker client":https://github.com/docker/docker/issues/16575. slurmd spool directory was full.    It simply makes the necessary API calls to remove the container after it exits. 

 Crunch's cancel code kills the Docker client.    If a user cancels a job, the container will hang around, along with its volumes.    We just had a situation where compute nodes on a cluster filled their @/tmp@ partitions, because a user was canceling many jobs, leaving it mostly full of finished VFS directories for Docker containers and their large tmp volumes. that were no longer running. 

 Make sure that when Crunch cancels a job, <pre>compute0.tb05z# docker ps -a 
 CONTAINER ID          IMAGE                                                                       COMMAND                  CREATED               STATUS                        PORTS                 NAMES 
 4e236764f52c          d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83:latest     "stdbuf --output=0 -     18 hours ago          Exited (1) 18 hours ago                           elated_wozniak 
 b904e1f16f1f          1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest     "stdbuf --output=0 -     18 hours ago                                                          nostalgic_franklin 
 59fec9bcda68          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     19 hours ago          Up 19 hours                                       goofy_wilson 
 6fe7c70100ef          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     20 hours ago          Exited (1) 19 hours ago                           mad_albattani 
 d1f80841ca42          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     20 hours ago          Exited (1) 20 hours ago                           hopeful_ritchie 
 a366129c6a1b          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     20 hours ago          Exited (1) 20 hours ago                           angry_cori 
 dbb29b69f7a3          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     20 hours ago          Exited (1) 20 hours ago                           elegant_fermat 
 b83c876b8ecf          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     22 hours ago          Exited (1) 20 hours ago                           modest_bell 
 fff2c5d781ec          1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest     "stdbuf --output=0 -     22 hours ago          Exited (127) 22 hours ago                         stoic_feynman 
 5a4fc1333fef          d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83:latest     "stdbuf --output=0 -     23 hours ago          Exited (1) 22 hours ago                           hopeful_hodgkin 
 a69f0ff99682          d33416e64af4370471ed15d19211e84991a8e158626199f4e4747e4310144b83:latest     "stdbuf --output=0 -     42 hours ago          Exited (1) 41 hours ago                           admiring_hoover 
 48bde55948f1          998107ac8e8ccbaf326337b69bf723890f0627322165a06fb090b1c8405611fc:latest     "stdbuf --output=0 -     46 hours ago          Exited (1) 46 hours ago                           stupefied_kowalevski 
 b62f346c8e85          1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest     "stdbuf --output=0 -     2 days ago            Exited (1) 2 days ago                             loving_mcclintock 
 12f20e821bbd          b85dffb1be2ca7bc757be6ff8ae4873a45214918282ef42cc2cbc2cead63356b:latest     "stdbuf --output=0 -     4 days ago            Exited (1) 3 days ago                             loving_bell 
 7b6ed97e23ae          1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest     "stdbuf --output=0 -     4 days ago            Exited (1) 4 days ago                             determined_pasteur 
 e258841ffcf1          1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest     "stdbuf --output=0 -     6 days ago            Exited (1) 6 days ago                             furious_leakey 
 3109f9488c66          b85dffb1be2ca7bc757be6ff8ae4873a45214918282ef42cc2cbc2cead63356b:latest     "stdbuf --output=0 -     8 days ago            Exited (1) 8 days ago                             fervent_thompson 
 164c4d49e8ce          1571e3450e5a2ab0a1468e306e23b186e172e222ce43056b351fcad993c75c88:latest     "stdbuf --output=0 -     10 days ago           Exited (1) 10 days ago                            sad_wozniak 
 242c764bfe5d          882cc785701a5d3a20d5fa5e244d22beb09e4861b4f1c654867f0ca0c154b029:latest     "stdbuf --output=0 -     12 days ago           Exited (1) 12 days ago                            sharp_archimedes 
 d83d9e200705          b85dffb1be2ca7bc757be6ff8ae4873a45214918282ef42cc2cbc2cead63356b:latest     "stdbuf --output=0 -     13 days ago           Exited (1) 13 days ago                            modest_mccarthy 
 </pre> 

 Why weren't these containers removed?    crunch-job on the corresponding cluster is new enough to use --rm, and Docker container is removed. new enough to respect it (1.6.0). 

 The fact that these containers exited 1 doesn't seem to explain it, either: 

 <pre>brinstar % docker version 
 Client version: 1.6.0 
 Client API version: 1.18 
 Go version (client): go1.4.2 
 Git commit (client): 4749651 
 OS/Arch (client): linux/amd64 
 Server version: 1.6.0 
 Server API version: 1.18 
 Go version (server): go1.4.2 
 Git commit (server): 4749651 
 OS/Arch (server): linux/amd64 
 brinstar % docker ps -a 
 CONTAINER ID          IMAGE                 COMMAND               CREATED               STATUS                PORTS                 NAMES 
 brinstar % docker run --rm=true debian:wheezy /bin/false 
 brinstar % docker ps -a 
 CONTAINER ID          IMAGE                 COMMAND               CREATED               STATUS                PORTS                 NAMES 
 brinstar % 
 </pre>

Back