Bug #21750
closed
crunch-run singularity port forwarding test fails on debian 12
Added by Tom Clegg 8 months ago.
Updated 3 months ago.
Release relationship:
Auto
Description
I started seeing this test failure after upgrading from debian 11 to 12:
----------------------------------------------------------------------
FAIL: singularity_test.go:39: singularitySuite.TestIPAddress
building singularity image
[singularity build /tmp/crunch-run-singularity-3013312958/image.sif docker-archive:///tmp/crunch-run-singularity-3013312958/image.tar]
INFO: Starting build...
Getting image source signatures
Copying blob sha256:67f770da229bf16d0c280f232629b0c1f1243a884df09f6b940a1c7288535a6d
Copying config sha256:a11e762410a6fb4e925d1ea535fecc177d983bdf0dba3261d244fb3c7ee18865
Writing manifest to image destination
Storing signatures
2024/05/03 15:06:19 info unpack layer: sha256:378e3b9fb50c743e1daa7a79dc2cf7c18aa0ac8137a1ca0d51a3b909c80e7d48
INFO: Creating SIF file...
INFO: Build complete: /tmp/crunch-run-singularity-3013312958/image.sif
singularity_test.go:50:
s.executorSuite.TestIPAddress(c)
executor_test.go:210:
c.Assert(err, IsNil)
... value *url.Error = &url.Error{Op:"Brew", URL:"http://10.23.0.2:44679", Err:(*net.OpError)(0xc000d108c0)} ("Brew \"http://10.23.0.2:44679\": dial tcp 10.23.0.2:44679: connect: connection refused")
It seems that --fakeroot
is no longer enough to make --net
work when invoking singularity as an unprivileged user:
$ /var/lib/arvados/bin/singularity exec --containall --cleanenv --pwd= /tmp/busybox.sif echo OK
OK
$ /var/lib/arvados/bin/singularity exec --containall --cleanenv --pwd= --fakeroot --net /tmp/busybox.sif echo OK
INFO: Converting SIF file to temporary sandbox...
ERROR: Network fakeroot is not permitted for unprivileged users.
INFO: Cleaning up image...
ERROR: could not delete networks: plugin type="firewall" failed (delete): could not initialize iptables protocol 0: could not get iptables version: exit status 111
FATAL: container creation failed: plugin type="ptp" failed (add): failed to locate iptables: could not get iptables version: exit status 111
- Related to Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
Prodded at this a bit and it just got more confusing.
singularity --net --network bridge --fakeroot
complains "network fakeroot is not permitted for unprivileged users" and exits
if I add --userns
arg, and install rootlesskit, then the error goes away and nc still doesn't run
It works fine if I use docker://busybox:uclibc -- but not if I convert to sif in a separate step
- Target version set to Development 2024-10-09 sprint
- Assigned To set to Tom Clegg
It seems the reason for "network fakeroot is not permitted for unprivileged users" and "could not get iptables version: exit status 111" is that iptables and nftables command line programs cannot be used in a setuid environment.
(It might be a nice improvement to improve the error message returned by the go-iptables module in this situation.)
I think singularity could circumvent that check with by resetting ruid to euid with something like
cmd.SysProcAttr = &syscall.SysProcAttr{Credential: &syscall.Credential{Uid: 0, Gid: 0}}
(but should it?)
Meanwhile, in order to enable networking on recent systems (debian 12+), singularity just needs to run as root.
- arvados-dispatch-cloud -- already runs crunch-run→singularity as root.
- slurm/lsf -- currently will fail to run a container when RuntimeConstraints.API is true. We could add a config (or detect the need by checking the
iptables --version
output) and run sudo singularity ...
when needed...?
- tests -- currently fail. We can
- if running as root, run the test
- if not running as root and ARVADOS_TEST_USE_SUDO is set, run "sudo singularity ..." in tests that enable networking
- if not running as root and ARVADOS_TEST_USE_SUDO is not set, skip the test
This branch also modifies TestIPAddress to allow a few seconds for the container to come up and start listening on the port, instead of failing immediately on "connection refused".
21750-singularity-networking @ b77a939d58754017500a9cd1352ac9979aeae119 -- developer-run-tests: #4468
- Status changed from New to In Progress
21750-singularity-networking @
78c476d822deaa9e772f5ceceb7e40ea4b9c0de8 --
developer-run-tests: #4473
- rebased onto main (was based on 20756 but didn't need to be)
Summary:
The test failure was just a test failure; running singularity on an actual debian12 compute node was not broken.
Now, the test no longer tries to use "fakeroot" to test port forwarding
- if tests run as root or with
ARVADOS_TEST_USE_SUDO=1
, it [uses sudo and] tests port forwarding in a way that still works on debian 12+
- otherwise, that test is skipped
- there is a separate test that
API: true
enables networking in the container, which doesn't depend on root, so is never skipped
In production (unchanged):
- in the cloud scenario, crunch-run invokes singularity as root, so networking works as desired
- in the slurm/lsf scenario, crunch-run is not root, so a container with "API: true" just uses the host's network interfaces
Tom Clegg wrote in #note-6:
21750-singularity-networking @ 78c476d822deaa9e772f5ceceb7e40ea4b9c0de8 -- developer-run-tests: #4473
This LGTM. My one suggestion would be to give the environment variable a generic name whose value specifies the privilege escalation method to use; e.g., ARVADOS_TEST_PRIVESC=sudo
. run0 is a thing now, and in general there's a push to get away from setuid binaries, so I think we'll want to support other methods… eventually. But admittedly I don't know when. At least three years away and very possibly more. So I'm fine with that being tomorrow's problem too.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF