Project

General

Profile

Actions

Bug #21750

closed

crunch-run singularity port forwarding test fails on debian 12

Added by Tom Clegg 5 months ago. Updated 2 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Story points:
-
Release:
Release relationship:
Auto

Description

I started seeing this test failure after upgrading from debian 11 to 12:

----------------------------------------------------------------------
FAIL: singularity_test.go:39: singularitySuite.TestIPAddress

building singularity image
[singularity build /tmp/crunch-run-singularity-3013312958/image.sif docker-archive:///tmp/crunch-run-singularity-3013312958/image.tar]
INFO:    Starting build...
Getting image source signatures
Copying blob sha256:67f770da229bf16d0c280f232629b0c1f1243a884df09f6b940a1c7288535a6d
Copying config sha256:a11e762410a6fb4e925d1ea535fecc177d983bdf0dba3261d244fb3c7ee18865
Writing manifest to image destination
Storing signatures
2024/05/03 15:06:19  info unpack layer: sha256:378e3b9fb50c743e1daa7a79dc2cf7c18aa0ac8137a1ca0d51a3b909c80e7d48
INFO:    Creating SIF file...
INFO:    Build complete: /tmp/crunch-run-singularity-3013312958/image.sif

singularity_test.go:50:
    s.executorSuite.TestIPAddress(c)
executor_test.go:210:
    c.Assert(err, IsNil)
... value *url.Error = &url.Error{Op:"Brew", URL:"http://10.23.0.2:44679", Err:(*net.OpError)(0xc000d108c0)} ("Brew \"http://10.23.0.2:44679\": dial tcp 10.23.0.2:44679: connect: connection refused")

It seems that --fakeroot is no longer enough to make --net work when invoking singularity as an unprivileged user:

$ /var/lib/arvados/bin/singularity exec --containall --cleanenv --pwd= /tmp/busybox.sif echo OK
OK
$ /var/lib/arvados/bin/singularity exec --containall --cleanenv --pwd= --fakeroot --net /tmp/busybox.sif echo OK
INFO:    Converting SIF file to temporary sandbox...
ERROR:   Network fakeroot is not permitted for unprivileged users.
INFO:    Cleaning up image...
ERROR:   could not delete networks: plugin type="firewall" failed (delete): could not initialize iptables protocol 0: could not get iptables version: exit status 111
FATAL:   container creation failed: plugin type="ptp" failed (add): failed to locate iptables: could not get iptables version: exit status 111

Subtasks 1 (0 open1 closed)

Task #22149: Review 21750-singularity-networkingResolvedTom Clegg10/03/2024Actions

Related issues

Related to Arvados - Bug #22050: Pid() did not return a process ID (bug in singularity support?)DuplicateTom CleggActions
Actions #1

Updated by Tom Clegg about 2 months ago

  • Related to Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
Actions #2

Updated by Tom Clegg about 2 months ago

Prodded at this a bit and it just got more confusing.

singularity --net --network bridge --fakeroot complains "network fakeroot is not permitted for unprivileged users" and exits

if I add --userns arg, and install rootlesskit, then the error goes away and nc still doesn't run

It works fine if I use docker://busybox:uclibc -- but not if I convert to sif in a separate step

Actions #3

Updated by Peter Amstutz 11 days ago

  • Target version set to Development 2024-10-09 sprint
  • Assigned To set to Tom Clegg
Actions #4

Updated by Tom Clegg 6 days ago

It seems the reason for "network fakeroot is not permitted for unprivileged users" and "could not get iptables version: exit status 111" is that iptables and nftables command line programs cannot be used in a setuid environment.

(It might be a nice improvement to improve the error message returned by the go-iptables module in this situation.)

I think singularity could circumvent that check with by resetting ruid to euid with something like

cmd.SysProcAttr = &syscall.SysProcAttr{Credential: &syscall.Credential{Uid: 0, Gid: 0}}

(but should it?)

Meanwhile, in order to enable networking on recent systems (debian 12+), singularity just needs to run as root.
  • arvados-dispatch-cloud -- already runs crunch-run→singularity as root.
  • slurm/lsf -- currently will fail to run a container when RuntimeConstraints.API is true. We could add a config (or detect the need by checking the iptables --version output) and run sudo singularity ... when needed...?
  • tests -- currently fail. We can
    • if running as root, run the test
    • if not running as root and ARVADOS_TEST_USE_SUDO is set, run "sudo singularity ..." in tests that enable networking
    • if not running as root and ARVADOS_TEST_USE_SUDO is not set, skip the test

This branch also modifies TestIPAddress to allow a few seconds for the container to come up and start listening on the port, instead of failing immediately on "connection refused".

21750-singularity-networking @ b77a939d58754017500a9cd1352ac9979aeae119 -- developer-run-tests: #4468

Actions #5

Updated by Tom Clegg 5 days ago

  • Status changed from New to In Progress
21750-singularity-networking @ 19163749aa54fbea03f642492abc75bbb4161c7b -- developer-run-tests: #4470
  • Adds a test to confirm that networking is enabled in a singularity container even when not running as root.
Actions #6

Updated by Tom Clegg 5 days ago

21750-singularity-networking @ 78c476d822deaa9e772f5ceceb7e40ea4b9c0de8 -- developer-run-tests: #4473
  • rebased onto main (was based on 20756 but didn't need to be)

Summary:

The test failure was just a test failure; running singularity on an actual debian12 compute node was not broken.

Now, the test no longer tries to use "fakeroot" to test port forwarding
  • if tests run as root or with ARVADOS_TEST_USE_SUDO=1, it [uses sudo and] tests port forwarding in a way that still works on debian 12+
  • otherwise, that test is skipped
  • there is a separate test that API: true enables networking in the container, which doesn't depend on root, so is never skipped
In production (unchanged):
  • in the cloud scenario, crunch-run invokes singularity as root, so networking works as desired
  • in the slurm/lsf scenario, crunch-run is not root, so a container with "API: true" just uses the host's network interfaces
Actions #7

Updated by Brett Smith 3 days ago

Tom Clegg wrote in #note-6:

21750-singularity-networking @ 78c476d822deaa9e772f5ceceb7e40ea4b9c0de8 -- developer-run-tests: #4473

This LGTM. My one suggestion would be to give the environment variable a generic name whose value specifies the privilege escalation method to use; e.g., ARVADOS_TEST_PRIVESC=sudo. run0 is a thing now, and in general there's a push to get away from setuid binaries, so I think we'll want to support other methods… eventually. But admittedly I don't know when. At least three years away and very possibly more. So I'm fine with that being tomorrow's problem too.

Actions #8

Updated by Tom Clegg 3 days ago

  • Status changed from In Progress to Resolved
Actions #9

Updated by Peter Amstutz 2 days ago

  • Release set to 70
Actions

Also available in: Atom PDF