Task #20835
closedBug #17244: Make sure cgroupsV2 works with Arvados
Update tordo compute image kernel config from "hybrid" to "unified" mode
Updated by Lucas Di Pentima 11 months ago
- Start date set to 08/09/2023
- Status changed from New to In Progress
Updated by Lucas Di Pentima 11 months ago
Updates at c56b8aaf7 - branch 20835-cgroupsv2-unified-mode
Compute image build pipeline for tordo: packer-build-compute-image: #231
- Re-enables
unified_cgroup_hierarchy
on GRUB's config.
Updated by Lucas Di Pentima 11 months ago
Previous pipeline failed. I suspect it's related to golang changes, so I've rebased the branch to start from 0b296b5:
Updates at 5d5c219
New build pipeline: packer-build-compute-image: #232
Updated by Lucas Di Pentima 11 months ago
Built and tested the AMI ami-01cbd6fb77f29928f
on an instance and got the following:
$ stat -fc %T /sys/fs/cgroup/ cgroup2fs $ mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
I'll deploy the saltstack changes so tordo
can use it.
Updated by Lucas Di Pentima 11 months ago
I've tried running a WF on tordo
and got the following crunchstat.txt
log:
2023-08-10T14:36:40.138803409Z warning: Pid() did not return a process ID after 10s (config error?) -- still waiting... 2023-08-10T14:41:19.987801929Z warning: Pid() never returned a process ID
Does this mean that cgroupsv2 is improperly set up?
Updated by Lucas Di Pentima 11 months ago
Thanks for the pointer. I've set it temporarily to use Docker, and ran a test WF, this is the crunchstat.txt
contents:
2023-08-10T17:44:42.864838496Z notice: reading stats from /sys/fs/cgroup/system.slice/docker-c7d2e9d0202cadbc64fbeb59416280e64bbce207823fc72443b8149b8e190135.scope/cpu.stat 2023-08-10T17:44:42.864871357Z notice: reading stats from /sys/fs/cgroup/system.slice/docker-c7d2e9d0202cadbc64fbeb59416280e64bbce207823fc72443b8149b8e190135.scope/io.stat 2023-08-10T17:44:42.864909464Z notice: reading stats from /sys/fs/cgroup/system.slice/docker-c7d2e9d0202cadbc64fbeb59416280e64bbce207823fc72443b8149b8e190135.scope/memory.stat 2023-08-10T17:44:42.864929381Z notice: reading stats from /sys/fs/cgroup/system.slice/docker-c7d2e9d0202cadbc64fbeb59416280e64bbce207823fc72443b8149b8e190135.scope/memory.current 2023-08-10T17:44:42.864946680Z notice: reading stats from /sys/fs/cgroup/system.slice/docker-c7d2e9d0202cadbc64fbeb59416280e64bbce207823fc72443b8149b8e190135.scope/memory.swap.current 2023-08-10T17:44:42.865057162Z using /proc/3622/net/dev 2023-08-10T17:44:42.865064953Z notice: monitoring temp dir /tmp/crunch-run.tordo-dz642-woqmi5ntdng12pt.3856672971 2023-08-10T17:44:42.865184004Z mem 0 swap 0 pgmajfault 974848 rss 2023-08-10T17:44:42.866161418Z cpu 0.0295 user 0.0088 sys 0 cpus 2023-08-10T17:44:42.866207498Z blkio:259:4 0 write 167936 read 2023-08-10T17:44:42.866212185Z blkio:254:0 0 write 167936 read 2023-08-10T17:44:42.866241993Z net:eth0 0 tx 340 rx 2023-08-10T17:44:42.866255312Z statfs 199043186688 available 440197120 used 210237366272 total
Updated by Tom Clegg 11 months ago
Hm, "0 cpus" doesn't look right.
2023-08-10T17:44:42.866161418Z cpu 0.0295 user 0.0088 sys 0 cpus
In #17244#note-32 (before updating the image) we had this line
2023-08-08T16:50:58.455605354Z notice: reading stats from /sys/fs/cgroup/system.slice/docker-d5b44b0d9c1178e64c87f7a90df8da5ace1c46e203022fe27de1f130017b62bc.scope/cpuset.cpus.effective
I don't see a corresponding cpuset.cpus.effective
line in #note-7.
I wonder if we need cgroup_enable=memory cgroup_enable=cpuset ...
? It's weird/annoying that these defaults seem so unpredictable.
Updated by Lucas Di Pentima 11 months ago
I've just launched a test instance with the latest image, and got this:
$ cat /proc/cgroups #subsys_name hierarchy num_cgroups enabled cpuset 0 100 1 cpu 0 100 1 cpuacct 0 100 1 blkio 0 100 1 memory 0 100 1 devices 0 100 1 freezer 0 100 1 net_cls 0 100 1 perf_event 0 100 1 net_prio 0 100 1 pids 0 100 1 rdma 0 100 1
The plot thickens...
Updated by Lucas Di Pentima 11 months ago
- Remaining (hours) set to 0.0
- Status changed from In Progress to Resolved