Bug #22185
closedfix tordo compute image to support cgroup limits with singularity
Updated by Peter Amstutz 2 months ago
Tom to figure out exactly what is wrong and how to fix it, will work with ops to update packer & deploy new image if necessary.
Updated by Tom Clegg 2 months ago
tordo's compute nodes use cgroups v2 "unified" mode (i.e., both cgroups v1 and cgroups v2 interfaces are available). But they fail crunch-run's test for cgroups2 memory/cpu limit support because cgroup.controllers
is empty:
root@ip-10-253-254-63:~# grep cgroup2 /proc/mounts cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0 root@ip-10-253-254-63:~# cat /sys/fs/cgroup/unified/cgroup.controllers
Singularity is in fact capable of enforcing limits:
root@ip-10-253-254-63:~# singularity exec --containall --cleanenv --pwd= --memory 123456789 --cpus 1 docker://busybox:uclibc echo ok INFO: Using cached SIF image ok root@ip-10-253-254-63:~# singularity exec --containall --cleanenv --pwd= --memory 1234 --cpus 1 docker://busybox:uclibc echo ok INFO: Using cached SIF image Killed
AFAICT this means singularity is using the cgroups v1 interface.
root@ip-10-253-254-63:~# grep -Ew 'memory|cpu' /proc/self/cgroup 6:cpu,cpuacct:/user.slice 4:memory:/user.slice/user-1000.slice/session-5.scope
So, when the cgroups v2 support check fails, and we're running as root, we can enable limits based on cgroups v1 support if the relevant controller names appear in /proc/self/cgroup
.
Updated by Peter Amstutz 2 months ago
Isn't cgroups v1 on its way out, though? Can singularity use cgroups v2? Why is /sys/fs/cgroup/unified/cgroup.controllers
empty?
Updated by Tom Clegg 2 months ago
Yes, cgroups v1 is on its way out and yes, Singularity can use cgroups v2.
I think I've found the answer to why/sys/fs/cgroup/unified/cgroup.controllers
is empty, and it is indeed only an issue when cgroups v2 is being used in "unified" mode, i.e., v1 is also available.
- From 'mounting' section of cgroup docs
- "A controller can be moved across hierarchies only after the controller is no longer referenced in its current hierarchy."
- "During transition to v2, system management software might still automount the v1 cgroup filesystem and so hijack all controllers during boot"
- Restated more pointedly in a stackoverflow answer
- "cgroup controllers can only be mounted in one hierarchy (v1 or v2). If you have a controller mounted on a legacy v1 hierarchy, then it won't show up in the cgroup2 hiearchy"
Updated by Tom Clegg 2 months ago
22185-singularity-cgroups-v1 @ a4ff70e0eab76e0098a0bff0a6819564d1b302f6 -- developer-run-tests: #4508