Bug #12991

[crunch2] Propagate memory limit from runtime constraints to docker container

Added by Tom Clegg about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
02/06/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release relationship:
Auto

Description

Current behavior: When a container tries to use more memory than it asked for, it competes with system processes, and the kernel OOM-killer sometimes kills system processes instead of the container.

Desired behavior: when a container tries to allocate more memory than specified in runtime_constraints, allocation fails and/or the container is killed. System processes (including crunch-run and slurmd) are not killed.

Explanation: We use the memory and cpu figures in container runtime_constraints to choose an appropriate node to run a container on (even taking kernel/system overhead into account), but we don't tell docker to limit the the container's memory use.

Proposed solution: We have an opportunity to do this in source:services/crunch-run/crunchrun.go L918:

        Resources: dockercontainer.Resources{
            CgroupParent: runner.setCgroupParent,
        },

(dockercontainer.Resources also has Memory and NanoCPUs fields)

The container's memory size (including swap) should be limited to the number of bytes given in runtime_constraints.


Subtasks

Task #13028: Review 12991-docker-memory-limitClosedTom Clegg

Associated revisions

Revision f309b872
Added by Tom Clegg about 3 years ago

Merge branch '12991-docker-memory-limit'

fixes #12991

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Peter Amstutz about 3 years ago

For reference, crunch-job (crunch1) logic takes MemTotal from /proc/meminfo and deflates it by 5% to get MEMLIMIT used for the container. It also sets SWAPLIMIT=MEMLIMIT+SWAP

I feel like it was important to set the swap limit along with the memory limit, but I don't quite remember the details now.

The right thing to do is probably to set the memory limit to the actual amount requested by the user. This would discourage behavior such as asking for 4 GB, getting a 7GB node, and actually using 6 GB of memory.

#2 Updated by Tom Clegg about 3 years ago

  • Description updated (diff)

#3 Updated by Tom Clegg about 3 years ago

Peter Amstutz wrote:

I feel like it was important to set the swap limit along with the memory limit, but I don't quite remember the details now.

ISTR "limit memory to X" is implemented as if swap space is free/unlimited, so (if we don't also limit swap) a 1G-limited container can allocate 2G of memory as long as 1G can be swapped to disk.

We want allocations above 1G to fail.

The right thing to do is probably to set the memory limit to the actual amount requested by the user. This would discourage behavior such as asking for 4 GB, getting a 7GB node, and actually using 6 GB of memory.

Yes, absolutely. Avoiding OOM-kills is important, but this is also about making the container's runtime environment as predictable as possible, so "no OOM-kill today" is a better predictor of "no OOM-kill tomorrow".

#4 Updated by Tom Morris about 3 years ago

  • Target version set to Arvados Future Sprints
  • Story points set to 1.0

#6 Updated by Tom Morris about 3 years ago

  • Target version changed from Arvados Future Sprints to 2018-02-14 Sprint

#7 Updated by Tom Clegg about 3 years ago

  • Assigned To set to Tom Clegg

#8 Updated by Tom Clegg about 3 years ago

12991-docker-memory-limit @ 2bae361432ff8f975803495b4fd0acfae02cd3ef

#9 Updated by Tom Clegg about 3 years ago

  • Status changed from New to In Progress

#10 Updated by Lucas Di Pentima about 3 years ago

This LGTM, thanks.

Test run successful: https://ci.curoverse.com/job/developer-run-tests/566/

#11 Updated by Anonymous about 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#12 Updated by Tom Morris over 2 years ago

  • Release set to 17

Also available in: Atom PDF