Project

General

Profile

Actions

Bug #3792

closed

[Crunch] Docker daemon grows to use all RAM, then won't start new containers

Added by Brett Smith over 9 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
-
Story points:
-

Description

One of qr1hi's compute nodes recently got into a state where it would not start any more Docker containers, because it could not allocate RAM for them. The specific error message was:

2014/09/03 14:06:48 Error response from daemon: Cannot start container HASH: fork/exec /tmp/docker/init/dockerinit-1.1.2: cannot allocate memory

free would report that >90% of RAM was free, but ps showed that the Docker daemon had lots of RAM reserved. Compute nodes are configured not to overcommit memory, so Linux wouldn't offer this reserved-but-unused RAM to anything else. Restarting the daemon resolved the issue.

We need to figure out a more permanent way to deal with this. One part could be to restart the Docker daemon regularly between jobs. We also may want to consider tweaks to Linux's RAM tunables on compute nodes.

Actions #1

Updated by Brett Smith over 9 years ago

  • Category set to Crunch
Actions #2

Updated by Brett Smith over 9 years ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith over 9 years ago

I can find several Docker issues about daemon memory use (1922, 5923, 6843), but they were all resolved before the release of 1.1.2 on July 22, which we're currently using. Next time we see this happen, we should try to collect more detailed information and file a new report.

Actions #4

Updated by Tom Clegg almost 7 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF