Feature #19099
closedSupport "arvados-client shell" when using arvados-dispatch-cloud + singularity
Description
Currently, the container-shell feature works only with arvados-dispatch-cloud + docker, because it relies on "docker exec" to inject a new process into an existing container.
As demonstrated in #18993, the nsenter
program (part of util-linux debian pkg) can inject a new process into a running singularity container if crunch-run is running as root or the nsenter binary has setuid/setcap attrs. In the arvados-dispatch-cloud scenario, crunch-run runs as root. When we extend this to slurm/lsf dispatch (not covered here) we'll need a setuid/setcap setup.
- find the PID of a process in the container
- e.g., use "lsns" to list namespaced PIDs, find one that has a "pid" namespace and is a child of our singularity child process (parent process is 4th field of /proc/{pid}/stat1)
- find the IP address of the container (in order to implement port-forwarding)
- When networking is enabled and we are running as root, use the singularity "--net" flag so the container can't access the host interface
- currently we default to host networking instead of bridged, which seems to have been an oversight
- if we are not running as root (e.g., LSF), it might work to use
--fakeroot --net
to isolate networking (a quick test with singularity 3.9.9 prints an error "Network fakeroot is not permitted for unprivileged users" but seems to work anyway -- seems like more investigation of compatibility/side effects is needed before making this change)
- Parse /proc/{ctr_pid}/net/fib_trie and /proc/self/net/fib_trie; the "/32 host LOCAL" entry that is present in the former but missing from the latter is the container's local IP address.
- When networking is enabled and we are running as root, use the singularity "--net" flag so the container can't access the host interface
- use "nsenter --target={ctr_pid} --all cmd" instead of "docker exec -i cmd"
We should also have a test case for this -- perhaps start a "sleep 60" container, then use container gateway to run "kill 1" inside the container. Currently there is no automated test for the docker implementation.
1 parsing /proc/*/stat is slightly sketchy -- start at the last ")"? ...or look for "\nPPid:\t%d\n" in /proc/*/status instead
$ ln -s `which cat` '/tmp/patho logical)' $ '/tmp/patho logical)' /proc/self/stat 3552009 (patho logical)) R 3542894 3552009 3542894 34850 3552009 4194304 101 0 0 0 0 0 0 0 20 0 1 0 275456935 8163328 129 18446744073709551615 93951078178816 93951078196905 140729773453040 0 0 0 0 0 0 0 0 0 17 5 0 0 0 0 0 93951078214736 93951078216320 93951102492672 140729773454886 140729773454922 140729773454922 140729773457380 0
Updated by Tom Clegg over 2 years ago
- Related to Idea #18993: research project: design for being able to start up shell inside an existing singularity container added
Updated by Tom Clegg over 2 years ago
19099-singularity-container-shell @ e30b7ec3040cac89a2e134fddf8cb47c1905ea82 -- developer-run-tests: #3125
Updated by Tom Clegg over 2 years ago
- enables "container shell" feature when using singularity + arvados-dispatch-cloud
- port-forwarding test is skipped if the test host doesn't appear to be set up with enough privileges (it needs to use
singularity --fakeroot
to give the container its own network interface address, but that feature depends on a kernel config, /proc/sys/kernel/unprivileged_userns_clone) - if crunch-run is running as root (as it does under arvados-dispatch-cloud) and the container has EnableNetwork=true, we now use "bridge" networking mode instead of the default non-isolated/host network.
- I think it would be desirable to use bridge mode in the HPC/non-root scenario as well, but we'd need to either probe for permission to use
--network=bridge
or make it an explicit config knob (crunch-run --singularity-network=bridge
?).
- I think it would be desirable to use bridge mode in the HPC/non-root scenario as well, but we'd need to either probe for permission to use
Updated by Lucas Di Pentima over 2 years ago
Some questions:
- Would it be more convenient to use a random unused port for the
TestIPAddress
test in filelib/crunchrun/executor_test.go
? - Should
TestIPAddress
andTestInject
be called from the docker's test suite too? - Is there a way to manually test this? I've seen that arvbox uses singularity by default but I'm not able to make it run a simple "sleep 300" workflow and also I'm getting a "Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher)" warning so I guess crunch-dispatch-local may not support it.
Updated by Tom Clegg over 2 years ago
- Would it be more convenient to use a random unused port for the
TestIPAddress
test in filelib/crunchrun/executor_test.go
?
On the contrary... the container should be able to listen on its own port 1951 regardless of what's in use on the host. I've updated the test case to ensure the port is in use on the host, and added some comments.
- Should
TestIPAddress
andTestInject
be called from the docker's test suite too?
All of the executorSuite tests are embedded by both dockerSuite and singularitySuite -- the overrides in singularitySuite are only there because it needs to do extra setup / check whether the test should be skipped due to test host config.
- Is there a way to manually test this? I've seen that arvbox uses singularity by default but I'm not able to make it run a simple "sleep 300" workflow and also I'm getting a "Not starting a gateway server (GatewayAuthSecret was not provided by dispatcher)" warning so I guess crunch-dispatch-local may not support it.
Right, container gateway only works with arvados-dispatch-cloud. When #15370 is merged we'll be able to update arvbox to use a-d-c with loopback driver, and add/update a lib/controller integration test to test container-shell end-to-end.
For now I think the easiest way to test manually is to merge & try 9tee4.
19099-singularity-container-shell @ aabea703ee77ca91da710dd9bce9716e5d35d7b2 -- developer-run-tests: #3150
Updated by Lucas Di Pentima over 2 years ago
Tom Clegg wrote:
- Would it be more convenient to use a random unused port for the
TestIPAddress
test in filelib/crunchrun/executor_test.go
?On the contrary... the container should be able to listen on its own port 1951 regardless of what's in use on the host. I've updated the test case to ensure the port is in use on the host, and added some comments.
Thanks, that's really helpful.
LGTM, please merge.
Updated by Tom Clegg over 2 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|469ea187586ea8017e26874c2d80414ce7571fae.