Bug #21750
closedcrunch-run singularity port forwarding test fails on debian 12
Description
I started seeing this test failure after upgrading from debian 11 to 12:
---------------------------------------------------------------------- FAIL: singularity_test.go:39: singularitySuite.TestIPAddress building singularity image [singularity build /tmp/crunch-run-singularity-3013312958/image.sif docker-archive:///tmp/crunch-run-singularity-3013312958/image.tar] INFO: Starting build... Getting image source signatures Copying blob sha256:67f770da229bf16d0c280f232629b0c1f1243a884df09f6b940a1c7288535a6d Copying config sha256:a11e762410a6fb4e925d1ea535fecc177d983bdf0dba3261d244fb3c7ee18865 Writing manifest to image destination Storing signatures 2024/05/03 15:06:19 info unpack layer: sha256:378e3b9fb50c743e1daa7a79dc2cf7c18aa0ac8137a1ca0d51a3b909c80e7d48 INFO: Creating SIF file... INFO: Build complete: /tmp/crunch-run-singularity-3013312958/image.sif singularity_test.go:50: s.executorSuite.TestIPAddress(c) executor_test.go:210: c.Assert(err, IsNil) ... value *url.Error = &url.Error{Op:"Brew", URL:"http://10.23.0.2:44679", Err:(*net.OpError)(0xc000d108c0)} ("Brew \"http://10.23.0.2:44679\": dial tcp 10.23.0.2:44679: connect: connection refused")
It seems that --fakeroot
is no longer enough to make --net
work when invoking singularity as an unprivileged user:
$ /var/lib/arvados/bin/singularity exec --containall --cleanenv --pwd= /tmp/busybox.sif echo OK OK $ /var/lib/arvados/bin/singularity exec --containall --cleanenv --pwd= --fakeroot --net /tmp/busybox.sif echo OK INFO: Converting SIF file to temporary sandbox... ERROR: Network fakeroot is not permitted for unprivileged users. INFO: Cleaning up image... ERROR: could not delete networks: plugin type="firewall" failed (delete): could not initialize iptables protocol 0: could not get iptables version: exit status 111 FATAL: container creation failed: plugin type="ptp" failed (add): failed to locate iptables: could not get iptables version: exit status 111
Updated by Tom Clegg 4 months ago
- Related to Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
Updated by Tom Clegg 4 months ago
Prodded at this a bit and it just got more confusing.
singularity --net --network bridge --fakeroot
complains "network fakeroot is not permitted for unprivileged users" and exits
if I add --userns
arg, and install rootlesskit, then the error goes away and nc still doesn't run
It works fine if I use docker://busybox:uclibc -- but not if I convert to sif in a separate step
Updated by Peter Amstutz 3 months ago
- Target version set to Development 2024-10-09 sprint
- Assigned To set to Tom Clegg
Updated by Tom Clegg 3 months ago
It seems the reason for "network fakeroot is not permitted for unprivileged users" and "could not get iptables version: exit status 111" is that iptables and nftables command line programs cannot be used in a setuid environment.
(It might be a nice improvement to improve the error message returned by the go-iptables module in this situation.)
I think singularity could circumvent that check with by resetting ruid to euid with something like
cmd.SysProcAttr = &syscall.SysProcAttr{Credential: &syscall.Credential{Uid: 0, Gid: 0}}
(but should it?)
Meanwhile, in order to enable networking on recent systems (debian 12+), singularity just needs to run as root.- arvados-dispatch-cloud -- already runs crunch-run→singularity as root.
- slurm/lsf -- currently will fail to run a container when RuntimeConstraints.API is true. We could add a config (or detect the need by checking the
iptables --version
output) and runsudo singularity ...
when needed...? - tests -- currently fail. We can
- if running as root, run the test
- if not running as root and ARVADOS_TEST_USE_SUDO is set, run "sudo singularity ..." in tests that enable networking
- if not running as root and ARVADOS_TEST_USE_SUDO is not set, skip the test
This branch also modifies TestIPAddress to allow a few seconds for the container to come up and start listening on the port, instead of failing immediately on "connection refused".
21750-singularity-networking @ b77a939d58754017500a9cd1352ac9979aeae119 -- developer-run-tests: #4468
Updated by Tom Clegg 3 months ago
- Status changed from New to In Progress
- Adds a test to confirm that networking is enabled in a singularity container even when not running as root.
Updated by Tom Clegg 3 months ago
- rebased onto main (was based on 20756 but didn't need to be)
Summary:
The test failure was just a test failure; running singularity on an actual debian12 compute node was not broken.
Now, the test no longer tries to use "fakeroot" to test port forwarding- if tests run as root or with
ARVADOS_TEST_USE_SUDO=1
, it [uses sudo and] tests port forwarding in a way that still works on debian 12+ - otherwise, that test is skipped
- there is a separate test that
API: true
enables networking in the container, which doesn't depend on root, so is never skipped
- in the cloud scenario, crunch-run invokes singularity as root, so networking works as desired
- in the slurm/lsf scenario, crunch-run is not root, so a container with "API: true" just uses the host's network interfaces
Updated by Brett Smith 3 months ago
Tom Clegg wrote in #note-6:
21750-singularity-networking @ 78c476d822deaa9e772f5ceceb7e40ea4b9c0de8 -- developer-run-tests: #4473
This LGTM. My one suggestion would be to give the environment variable a generic name whose value specifies the privilege escalation method to use; e.g., ARVADOS_TEST_PRIVESC=sudo
. run0 is a thing now, and in general there's a push to get away from setuid binaries, so I think we'll want to support other methods… eventually. But admittedly I don't know when. At least three years away and very possibly more. So I'm fine with that being tomorrow's problem too.
Updated by Tom Clegg 3 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|4a3b84159f19ecd437b0ef418f394e8cde22b5d2.