Project

General

Profile

Bug #12991

Updated by Tom Clegg about 6 years ago

Current behavior: When a container tries to use more memory than it asked for, it competes with system processes, and the kernel OOM-killer sometimes kills system processes instead of the container. 

 Desired behavior: when a container tries to allocate more memory than specified in runtime_constraints, allocation fails and/or the container is killed. System processes (including crunch-run and slurmd) are not killed. 

 Explanation: We use the memory and cpu figures in container runtime_constraints to choose an appropriate node to run a container on (even taking kernel/system overhead into account), but we don't tell docker to limit the the container's memory use. 

 As a result,  

 Proposed solution: We have an opportunity to do this in source:services/crunch-run/crunchrun.go L918: 

 <pre><code class="go"> 
		 Resources: dockercontainer.Resources{ 
			 CgroupParent: runner.setCgroupParent, 
		 }, 
 </code></pre> 

 (dockercontainer.Resources also has Memory and NanoCPUs fields) 

 The container's memory size (including swap) should be limited to the number of bytes given in runtime_constraints. 

Back