Project

General

Profile

Actions

Feature #19099

closed

Support "arvados-client shell" when using arvados-dispatch-cloud + singularity

Added by Tom Clegg almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

Currently, the container-shell feature works only with arvados-dispatch-cloud + docker, because it relies on "docker exec" to inject a new process into an existing container.

As demonstrated in #18993, the nsenter program (part of util-linux debian pkg) can inject a new process into a running singularity container if crunch-run is running as root or the nsenter binary has setuid/setcap attrs. In the arvados-dispatch-cloud scenario, crunch-run runs as root. When we extend this to slurm/lsf dispatch (not covered here) we'll need a setuid/setcap setup.

So, when the runtime engine is singularity, we need to
  • find the PID of a process in the container
    • e.g., use "lsns" to list namespaced PIDs, find one that has a "pid" namespace and is a child of our singularity child process (parent process is 4th field of /proc/{pid}/stat1)
  • find the IP address of the container (in order to implement port-forwarding)
    • When networking is enabled and we are running as root, use the singularity "--net" flag so the container can't access the host interface
      • currently we default to host networking instead of bridged, which seems to have been an oversight
      • if we are not running as root (e.g., LSF), it might work to use --fakeroot --net to isolate networking (a quick test with singularity 3.9.9 prints an error "Network fakeroot is not permitted for unprivileged users" but seems to work anyway -- seems like more investigation of compatibility/side effects is needed before making this change)
    • Parse /proc/{ctr_pid}/net/fib_trie and /proc/self/net/fib_trie; the "/32 host LOCAL" entry that is present in the former but missing from the latter is the container's local IP address.
  • use "nsenter --target={ctr_pid} --all cmd" instead of "docker exec -i cmd"

We should also have a test case for this -- perhaps start a "sleep 60" container, then use container gateway to run "kill 1" inside the container. Currently there is no automated test for the docker implementation.

1 parsing /proc/*/stat is slightly sketchy -- start at the last ")"? ...or look for "\nPPid:\t%d\n" in /proc/*/status instead

$ ln -s `which cat` '/tmp/patho logical)'
$ '/tmp/patho logical)' /proc/self/stat
3552009 (patho logical)) R 3542894 3552009 3542894 34850 3552009 4194304 101 0 0 0 0 0 0 0 20 0 1 0 275456935 8163328 129 18446744073709551615 93951078178816 93951078196905 140729773453040 0 0 0 0 0 0 0 0 0 17 5 0 0 0 0 0 93951078214736 93951078216320 93951102492672 140729773454886 140729773454922 140729773454922 140729773457380 0


Subtasks 1 (0 open1 closed)

Task #19115: Review 19099-singularity-container-shellResolvedTom Clegg05/17/2022Actions

Related issues

Related to Arvados - Idea #18993: research project: design for being able to start up shell inside an existing singularity containerResolvedTom CleggActions
Actions

Also available in: Atom PDF