Bug #18486
openDocker containers are always removed
Description
Observed in Arvados 2.3.1:
When trying to debug a CWL workflow running on the Docker container runtime, it appears that the Docker containers are automatically removed after they have finished running.
This happens regardless of the arvados-docker-cleaner service running or the RemoveStoppedContainers setting in its config file.
Updated by Peter Amstutz about 3 years ago
- Release deleted (
45)
As a process note, we use the "Release" field to designate which release a bug is being fixed in, not as the release the bug was found.
Updated by Peter Amstutz about 3 years ago
You want the (stopped) containers themselves to stick around, not just the images? In general we avoid that because you can fill up your scratch space very quickly, and users typically don't have access to compute nodes with containers anyway.
However we could add some kind of admin-level configuration option for debugging in those cases where the users do have access to the compute node.
You might also be interested in the container shell access feature:
https://doc.arvados.org/v2.3/install/container-shell-access.html
https://doc.arvados.org/v2.3/user/debugging/container-shell-access.html
Updated by Tom Schoonjans about 3 years ago
Peter Amstutz wrote:
You want the (stopped) containers themselves to stick around, not just the images? In general we avoid that because you can fill up your scratch space very quickly, and users typically don't have access to compute nodes with containers anyway.
However we could add some kind of admin-level configuration option for debugging in those cases where the users do have access to the compute node.
You might also be interested in the container shell access feature:
https://doc.arvados.org/v2.3/install/container-shell-access.html
https://doc.arvados.org/v2.3/user/debugging/container-shell-access.html
Yes when we ran into trouble with the Singularity runtime last week, I gave the Docker runtime a try instead, but couldn't debug any issues as the containers were removed immediately after they finished running. This seems to contradict the note in https://doc.arvados.org/v2.3/install/crunch2/install-compute-node-docker.html#docker-cleaner, which states that the arvados-docker-cleaner daemon is responsible for cleaning up Docker containers (and images), meaning that no containers should ever get removed if the daemon is not running, or when "RemoveStoppedContainers":"never" is added to its config file.
Not really an issue for us, since we got the Singularity runtime up and running again after Tom's analysis of the problem and suggested fix, but thought it would be good for you to know about this.
Updated by Peter Amstutz about 3 years ago
When the container stops, we call ContainerRemove(). That's by design.
The docker-cleaner service is a bit of a legacy. I think you're right that the documentation is a little misleading. It's been our intention to get rid of docker-cleaner entirely and have have crunch-run be responsible for cleaning up containers and container images when it starts (#12900). That would be closer to what you want, then the container would stick around at least until the next container starts (or you could drain the node to prevent new jobs from being scheduled).
On the other hand, singularity doesn't leave containers or container images around at all after it stops, and we load the singularity image from arv-mount, so none of this applies to the singularity case.
Updated by Peter Amstutz about 3 years ago
- Related to Feature #12900: [Crunch2] [crunch-run] Prune old images before installing image for current container added