Project

General

Profile

Bug #3792

Updated by Brett Smith over 9 years ago

One of qr1hi's compute nodes recently got into a state where it would not start any more Docker containers, because it could not allocate RAM for them.    The specific error message was: 

 <pre> 
 2014/09/03 14:06:48 Error response from daemon: Cannot start container HASH: fork/exec /tmp/docker/init/dockerinit-1.1.2: cannot allocate memory 
 </pre> 

 @free@ would report that >90% of RAM was free, but @ps@ showed that the Docker daemon had lots of RAM reserved.    Compute nodes are configured not to overcommit memory, so Linux wouldn't offer this reserved-but-unused RAM to anything else.    Restarting the daemon resolved the issue. 

 We need to figure out a more permanent way to deal with this.    One part could be to restart the Docker daemon regularly between jobs.    We also may want to consider tweaks to Linux's RAM tunables on compute nodes.

Back