Bug #12991
Updated by Tom Clegg almost 7 years ago
Current behavior: When a container tries to use more memory than it asked for, it competes with system processes, and the kernel OOM-killer sometimes kills system processes instead of the container. Desired behavior: when a container tries to allocate more memory than specified in runtime_constraints, allocation fails and/or the container is killed. System processes (including crunch-run and slurmd) are not killed. Explanation: We use the memory and cpu figures in container runtime_constraints to choose an appropriate node to run a container on (even taking kernel/system overhead into account), but we don't tell docker to limit the the container's memory use. As a result, Proposed solution: We have an opportunity to do this in source:services/crunch-run/crunchrun.go L918: <pre><code class="go"> Resources: dockercontainer.Resources{ CgroupParent: runner.setCgroupParent, }, </code></pre> (dockercontainer.Resources also has Memory and NanoCPUs fields) The container's memory size (including swap) should be limited to the number of bytes given in runtime_constraints.