Bug #22185
closedfix tordo compute image to support cgroup limits with singularity
Updated by Peter Amstutz 19 days ago
Tom to figure out exactly what is wrong and how to fix it, will work with ops to update packer & deploy new image if necessary.
Updated by Tom Clegg 18 days ago
tordo's compute nodes use cgroups v2 "unified" mode (i.e., both cgroups v1 and cgroups v2 interfaces are available). But they fail crunch-run's test for cgroups2 memory/cpu limit support because cgroup.controllers
is empty:
root@ip-10-253-254-63:~# grep cgroup2 /proc/mounts cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0 root@ip-10-253-254-63:~# cat /sys/fs/cgroup/unified/cgroup.controllers
Singularity is in fact capable of enforcing limits:
root@ip-10-253-254-63:~# singularity exec --containall --cleanenv --pwd= --memory 123456789 --cpus 1 docker://busybox:uclibc echo ok INFO: Using cached SIF image ok root@ip-10-253-254-63:~# singularity exec --containall --cleanenv --pwd= --memory 1234 --cpus 1 docker://busybox:uclibc echo ok INFO: Using cached SIF image Killed
AFAICT this means singularity is using the cgroups v1 interface.
root@ip-10-253-254-63:~# grep -Ew 'memory|cpu' /proc/self/cgroup 6:cpu,cpuacct:/user.slice 4:memory:/user.slice/user-1000.slice/session-5.scope
So, when the cgroups v2 support check fails, and we're running as root, we can enable limits based on cgroups v1 support if the relevant controller names appear in /proc/self/cgroup
.
Updated by Peter Amstutz 18 days ago
Isn't cgroups v1 on its way out, though? Can singularity use cgroups v2? Why is /sys/fs/cgroup/unified/cgroup.controllers
empty?
Updated by Tom Clegg 18 days ago
Yes, cgroups v1 is on its way out and yes, Singularity can use cgroups v2.
I think I've found the answer to why/sys/fs/cgroup/unified/cgroup.controllers
is empty, and it is indeed only an issue when cgroups v2 is being used in "unified" mode, i.e., v1 is also available.
- From 'mounting' section of cgroup docs
- "A controller can be moved across hierarchies only after the controller is no longer referenced in its current hierarchy."
- "During transition to v2, system management software might still automount the v1 cgroup filesystem and so hijack all controllers during boot"
- Restated more pointedly in a stackoverflow answer
- "cgroup controllers can only be mounted in one hierarchy (v1 or v2). If you have a controller mounted on a legacy v1 hierarchy, then it won't show up in the cgroup2 hiearchy"
Updated by Tom Clegg 17 days ago
22185-singularity-cgroups-v1 @ a4ff70e0eab76e0098a0bff0a6819564d1b302f6 -- developer-run-tests: #4508