Bug #8798

[Node Manager] huge virtual memory size

Added by Peter Amstutz over 4 years ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Recently ran into the "cannot fork, out of memory" error which required a restart of node manager. The memory profile was approximately 97 MiB resident memory size, and 43 GiB virtual memory size. This suggests that #8543 was successful in eliminating the egregious memory leak, but there is some other behavior that is causing unbounded growth in the virtual process size. This isn't quite as bad as before (it doesn't take up all the resident size and crash other processes on the system) but it still reaches a point where the kernel won't fork the process any more (likely due to the page table growing too large).

Requires further investigation. One possible suspect is threading; node manager creates and discards a huge number of threads, if each one bumps up the virtual size by a little bit, it would add up. If this seems to be the case, consider a thread polling solution to re-use threads.

Update: further research suggests this might be a problem with Python 2.7 memory management, suggests upgrading to Python 3.3:

https://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html

Another thing to try is to set "export MALLOC_ARENA_MAX=1" which tells glibc to use per-thread memory pools.


Related issues

Related to Arvados - Bug #8686: [Node Manager] qr1hi nodemanager can't start if ulimit is in place New03/14/2016

History

#1 Updated by Peter Amstutz over 4 years ago

  • Subject changed from [Node Manager] huge virtual size to [Node Manager] huge virtual memory size

#2 Updated by Peter Amstutz over 4 years ago

  • Description updated (diff)

#3 Updated by Peter Amstutz over 4 years ago

  • Description updated (diff)

#4 Updated by Peter Amstutz over 4 years ago

  • Description updated (diff)

#5 Updated by Tom Morris over 3 years ago

  • Target version set to 2017-03-15 sprint

#6 Updated by Peter Amstutz over 3 years ago

  • Assigned To set to Peter Amstutz

#7 Updated by Tom Morris over 3 years ago

  • Target version changed from 2017-03-15 sprint to Arvados Future Sprints
  • Assigned To deleted (Peter Amstutz)

#8 Updated by Peter Amstutz 10 months ago

  • Status changed from New to Closed

Also available in: Atom PDF