Project

General

Profile

Actions

Feature #20756

closed

Support crunchstat tracking and memory limits with singularity

Added by Tom Clegg over 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Docker
Story points:
2.0
Release:
Release relationship:
Auto

Description

Singularity has capability to put the container in a new cgroup and set resource usage limits. Even without applying any limits, this also enables resource usage tracking by crunchstat.

https://docs.sylabs.io/guides/3.0/user-guide/cgroups.html

The docs say "the --apply-cgroups option can only be used with root privileges" but these tests worked as a non-root user:

$ singularity version
3.10.4-dirty
$ singularity exec --apply-cgroups /dev/null docker://debian:12 sleep 600 &
[1] 60133
$ pstree -up | grep sleep
           |                     |             `-starter-suid(60133)-+-sleep(60151)
$ cat /proc/60133/cgroup
0::/user.slice/user-1000.slice/session-5424.scope
$ cat /proc/60151/cgroup
0::/user.slice/user-1000.slice/user@1000.service/user.slice/singularity-60151.scope
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/user.slice/singularity-60151.scope/memory.current 
2465792
$ singularity exec --apply-cgroups <(printf '[memory]\n limit = 5000000\n') docker://debian:12 echo ok
ok
$ singularity exec --apply-cgroups <(printf '[memory]\n limit = 5000\n') docker://debian:12 echo ok
Killed

As of #17244 crunch-run does not correctly identify the pid of a process inside the container when telling crunchstat which process/cgroup to monitor (it returns the pid of the singularity executor wrapper instead). This will also need to be fixed in order for crunchstat to work correctly.


Subtasks 1 (0 open1 closed)

Task #20870: Review 20756-singularity-cgroupsResolvedTom Clegg10/03/2024Actions

Related issues 2 (0 open2 closed)

Has duplicate Arvados - Bug #22050: Pid() did not return a process ID (bug in singularity support?)DuplicateTom CleggActions
Blocked by Arvados - Bug #17244: Make sure cgroupsV2 works with ArvadosResolvedTom Clegg07/18/2023Actions
Actions #1

Updated by Tom Clegg over 1 year ago

  • Target version set to Future
  • Category set to Docker
  • Tracker changed from Bug to Feature
Actions #2

Updated by Tom Clegg over 1 year ago

  • Blocked by Bug #17244: Make sure cgroupsV2 works with Arvados added
Actions #3

Updated by Brett Smith over 1 year ago

Tom Clegg wrote:

The docs say "the --apply-cgroups option can only be used with root privileges" but these tests worked as a non-root user:

  1. It's possible that line was written before user namespaces were widely available/enabled, and has become obsolete since. The timeline kinda works: Singularity 3.0.0 was released October 2018, and Debian got user namespaces in 11, released August 2021.
  2. But also, if you're going through starter-suid, don't you have root privileges at some level?
Actions #4

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Future to Development 2023-08-30
  • Assigned To set to Tom Clegg
Actions #5

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-08-30 to Development 2023-09-13 sprint
Actions #6

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-09-13 sprint to Development 2023-09-27 sprint
Actions #7

Updated by Peter Amstutz over 1 year ago

  • Story points set to 2.0
Actions #8

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-09-27 sprint to To be scheduled
Actions #9

Updated by Peter Amstutz 10 months ago

  • Target version changed from To be scheduled to Future
Actions #10

Updated by Peter Amstutz 3 months ago

  • Related to Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
Actions #11

Updated by Tom Clegg 3 months ago

  • Target version changed from Future to Development 2024-10-09 sprint
  • Status changed from New to In Progress

20756-singularity-cgroups @ 896ab4b3b411f532da98e874d150b4836416172c -- developer-run-tests: #4471

Tests pass with
  • singularity 3.10.4 on debian 11 (jenkins)
  • singularity 3.10.4 on debian 12
  • singularity 4.2.1 on debian 12

Singularity cgroup support has a number of dependencies (kernel ≥ 4.15, systemd ≥ 224, ... see https://docs.sylabs.io/guides/latest/user-guide/cgroups.html).

I've added some code to skip applying resource limits if we can tell they're not supported (in particular, the default debian install allows memory limits but not cpu limits). But I think we need to do one or more of
  • also check that systemd appears to be installed/working
  • add upgrade/install docs that advise running singularity run --cpus 1 --memory 10000000 busybox echo OK to ensure compatibility
  • add a config entry to disable singularity resource limits
Actions #12

Updated by Tom Clegg 3 months ago

  • Related to deleted (Bug #22050: Pid() did not return a process ID (bug in singularity support?))
Actions #13

Updated by Tom Clegg 3 months ago

  • Has duplicate Bug #22050: Pid() did not return a process ID (bug in singularity support?) added
Actions #14

Updated by Tom Clegg 3 months ago

20756-singularity-cgroups @ 0fc35e3fcda0dcae3ddf48053e1b26f244d61cea -- developer-run-tests: #4474
retry fuse: developer-run-tests-doc-pysdk-api-fuse: #424
  • All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
    • ✅ singularity applies cpu/RAM limits if (as far as we can tell) that is supported
    • ✅ this also enables crunchstat resource tracking
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • If this branch looks good, then we should probably update the compute image build script to run singularity run --cpus 1 --memory 10000000 busybox echo OK (advice welcome)
    • Added #22161 to update our recommended singularity version from 3.x to 4.x (I tried 4.x in case it changed the cgroup behavior and it didn't seem to change anything at all -- I don't want to creep scope here, but we should probably update to 4.x if we have no reason to stay on 3.x)
  • Code is tested and passing, both automated and manual, what manual testing was done is described.
    • automated tests pass on debian 11 and 12
  • New or changed UX/UX and has gotten feedback from stakeholders.
    • N/A
  • Documentation has been updated.
    • N/A
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • If the new systemd/cgroup compatibility tests are negative, we fall back to the previous behavior (no resource limits, no crunchstat tracking)
  • Follows our coding standards and GUI style guidelines.

Feedback welcome on whether systemd-run --wait --user true is a reasonable way to probe for a working systemd setup.

Actions #15

Updated by Brett Smith 3 months ago

Tom Clegg wrote in #note-14:

Feedback welcome on whether systemd-run --wait --user true is a reasonable way to probe for a working systemd setup.

The "standard" test is to just check whether /run/systemd/system is a directory. I guess if you wanted to be super extra sure that a user daemon was running, you could check $XDG_RUNTIME_DIR/systemd/user instead.

Actions #16

Updated by Peter Amstutz 3 months ago

  • Release set to 70
Actions #17

Updated by Tom Clegg 3 months ago

20756-singularity-cgroups @ a0ee23096d91d2f9d010c5338df8a0c82daddcf8 -- developer-run-tests: #4478
  • update systemd test to check that $XDG_RUNTIME_DIR/systemd is a directory and DBUS_SESSION_BUS_ADDRESS is set, instead of invoking systemd-run.
  • don't do the systemd test if running as root (this avoids introducing unnecessary dependencies/checks in a cloud config).
Actions #18

Updated by Lucas Di Pentima 3 months ago

Branch 20756-singularity-cgroups at a0ee230 LGTM, thanks.

Actions #19

Updated by Tom Clegg 3 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF