Project

General

Profile

Actions

Bug #12991

closed

[crunch2] Propagate memory limit from runtime constraints to docker container

Added by Tom Clegg over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0
Release relationship:
Auto

Description

Current behavior: When a container tries to use more memory than it asked for, it competes with system processes, and the kernel OOM-killer sometimes kills system processes instead of the container.

Desired behavior: when a container tries to allocate more memory than specified in runtime_constraints, allocation fails and/or the container is killed. System processes (including crunch-run and slurmd) are not killed.

Explanation: We use the memory and cpu figures in container runtime_constraints to choose an appropriate node to run a container on (even taking kernel/system overhead into account), but we don't tell docker to limit the the container's memory use.

Proposed solution: We have an opportunity to do this in source:services/crunch-run/crunchrun.go L918:

        Resources: dockercontainer.Resources{
            CgroupParent: runner.setCgroupParent,
        },

(dockercontainer.Resources also has Memory and NanoCPUs fields)

The container's memory size (including swap) should be limited to the number of bytes given in runtime_constraints.


Subtasks 1 (0 open1 closed)

Task #13028: Review 12991-docker-memory-limitClosedTom Clegg02/06/2018Actions
Actions #1

Updated by Peter Amstutz over 6 years ago

For reference, crunch-job (crunch1) logic takes MemTotal from /proc/meminfo and deflates it by 5% to get MEMLIMIT used for the container. It also sets SWAPLIMIT=MEMLIMIT+SWAP

I feel like it was important to set the swap limit along with the memory limit, but I don't quite remember the details now.

The right thing to do is probably to set the memory limit to the actual amount requested by the user. This would discourage behavior such as asking for 4 GB, getting a 7GB node, and actually using 6 GB of memory.

Actions #2

Updated by Tom Clegg about 6 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg about 6 years ago

Peter Amstutz wrote:

I feel like it was important to set the swap limit along with the memory limit, but I don't quite remember the details now.

ISTR "limit memory to X" is implemented as if swap space is free/unlimited, so (if we don't also limit swap) a 1G-limited container can allocate 2G of memory as long as 1G can be swapped to disk.

We want allocations above 1G to fail.

The right thing to do is probably to set the memory limit to the actual amount requested by the user. This would discourage behavior such as asking for 4 GB, getting a 7GB node, and actually using 6 GB of memory.

Yes, absolutely. Avoiding OOM-kills is important, but this is also about making the container's runtime environment as predictable as possible, so "no OOM-kill today" is a better predictor of "no OOM-kill tomorrow".

Actions #4

Updated by Tom Morris about 6 years ago

  • Target version set to Arvados Future Sprints
  • Story points set to 1.0
Actions #6

Updated by Tom Morris about 6 years ago

  • Target version changed from Arvados Future Sprints to 2018-02-14 Sprint
Actions #7

Updated by Tom Clegg about 6 years ago

  • Assigned To set to Tom Clegg
Actions #8

Updated by Tom Clegg about 6 years ago

12991-docker-memory-limit @ 2bae361432ff8f975803495b4fd0acfae02cd3ef

Actions #9

Updated by Tom Clegg about 6 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Lucas Di Pentima about 6 years ago

This LGTM, thanks.

Test run successful: https://ci.curoverse.com/job/developer-run-tests/566/

Actions #11

Updated by Anonymous about 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #12

Updated by Tom Morris almost 6 years ago

  • Release set to 17
Actions

Also available in: Atom PDF