Feature #20756
closed
Support crunchstat tracking and memory limits with singularity
Added by Tom Clegg over 1 year ago.
Updated about 2 months ago.
Release relationship:
Auto
Description
Singularity has capability to put the container in a new cgroup and set resource usage limits. Even without applying any limits, this also enables resource usage tracking by crunchstat.
https://docs.sylabs.io/guides/3.0/user-guide/cgroups.html
The docs say "the --apply-cgroups
option can only be used with root privileges" but these tests worked as a non-root user:
$ singularity version
3.10.4-dirty
$ singularity exec --apply-cgroups /dev/null docker://debian:12 sleep 600 &
[1] 60133
$ pstree -up | grep sleep
| | `-starter-suid(60133)-+-sleep(60151)
$ cat /proc/60133/cgroup
0::/user.slice/user-1000.slice/session-5424.scope
$ cat /proc/60151/cgroup
0::/user.slice/user-1000.slice/user@1000.service/user.slice/singularity-60151.scope
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/user.slice/singularity-60151.scope/memory.current
2465792
$ singularity exec --apply-cgroups <(printf '[memory]\n limit = 5000000\n') docker://debian:12 echo ok
ok
$ singularity exec --apply-cgroups <(printf '[memory]\n limit = 5000\n') docker://debian:12 echo ok
Killed
As of #17244 crunch-run does not correctly identify the pid of a process inside the container when telling crunchstat which process/cgroup to monitor (it returns the pid of the singularity executor wrapper instead). This will also need to be fixed in order for crunchstat to work correctly.
- Target version set to Future
- Category set to Docker
- Tracker changed from Bug to Feature
- Blocked by Bug #17244: Make sure cgroupsV2 works with Arvados added
Tom Clegg wrote:
The docs say "the --apply-cgroups
option can only be used with root privileges" but these tests worked as a non-root user:
- It's possible that line was written before user namespaces were widely available/enabled, and has become obsolete since. The timeline kinda works: Singularity 3.0.0 was released October 2018, and Debian got user namespaces in 11, released August 2021.
- But also, if you're going through
starter-suid
, don't you have root privileges at some level?
- Target version changed from Future to Development 2023-08-30
- Assigned To set to Tom Clegg
- Target version changed from Development 2023-08-30 to Development 2023-09-13 sprint
- Target version changed from Development 2023-09-13 sprint to Development 2023-09-27 sprint
- Target version changed from Development 2023-09-27 sprint to To be scheduled
- Target version changed from To be scheduled to Future
- Related to Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
- Target version changed from Future to Development 2024-10-09 sprint
- Status changed from New to In Progress
20756-singularity-cgroups @ 896ab4b3b411f532da98e874d150b4836416172c -- developer-run-tests: #4471
Tests pass with
- singularity 3.10.4 on debian 11 (jenkins)
- singularity 3.10.4 on debian 12
- singularity 4.2.1 on debian 12
Singularity cgroup support has a number of dependencies (kernel ≥ 4.15, systemd ≥ 224, ... see https://docs.sylabs.io/guides/latest/user-guide/cgroups.html).
I've added some code to skip applying resource limits if we can tell they're not supported (in particular, the default debian install allows memory limits but not cpu limits). But I think we need to do one or more of
- also check that systemd appears to be installed/working
- add upgrade/install docs that advise running
singularity run --cpus 1 --memory 10000000 busybox echo OK
to ensure compatibility
- add a config entry to disable singularity resource limits
- Related to deleted (Bug #22050: Pid() did not return a process ID (bug in singularity support?))
- Has duplicate Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
20756-singularity-cgroups @
0fc35e3fcda0dcae3ddf48053e1b26f244d61cea --
developer-run-tests: #4474 retry fuse:
developer-run-tests-doc-pysdk-api-fuse: #424
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅ singularity applies cpu/RAM limits if (as far as we can tell) that is supported
- ✅ this also enables crunchstat resource tracking
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- If this branch looks good, then we should probably update the compute image build script to run
singularity run --cpus 1 --memory 10000000 busybox echo OK
(advice welcome)
- Added #22161 to update our recommended singularity version from 3.x to 4.x (I tried 4.x in case it changed the cgroup behavior and it didn't seem to change anything at all -- I don't want to creep scope here, but we should probably update to 4.x if we have no reason to stay on 3.x)
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- automated tests pass on debian 11 and 12
- New or changed UX/UX and has gotten feedback from stakeholders.
- Documentation has been updated.
- Behaves appropriately at the intended scale (describe intended scale).
- Considered backwards and forwards compatibility issues between client and server.
- If the new systemd/cgroup compatibility tests are negative, we fall back to the previous behavior (no resource limits, no crunchstat tracking)
- Follows our coding standards and GUI style guidelines.
Feedback welcome on whether systemd-run --wait --user true
is a reasonable way to probe for a working systemd setup.
Tom Clegg wrote in #note-14:
Feedback welcome on whether systemd-run --wait --user true
is a reasonable way to probe for a working systemd setup.
The "standard" test is to just check whether /run/systemd/system
is a directory. I guess if you wanted to be super extra sure that a user daemon was running, you could check $XDG_RUNTIME_DIR/systemd/user
instead.
20756-singularity-cgroups @
a0ee23096d91d2f9d010c5338df8a0c82daddcf8 --
developer-run-tests: #4478
- update systemd test to check that
$XDG_RUNTIME_DIR/systemd
is a directory and DBUS_SESSION_BUS_ADDRESS
is set, instead of invoking systemd-run.
- don't do the systemd test if running as root (this avoids introducing unnecessary dependencies/checks in a cloud config).
Branch 20756-singularity-cgroups
at a0ee230 LGTM, thanks.
- Status changed from In Progress to Resolved
Also available in: Atom
PDF