Project

General

Profile

Actions

Bug #8798

closed

[Node Manager] huge virtual memory size

Added by Peter Amstutz about 8 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

Recently ran into the "cannot fork, out of memory" error which required a restart of node manager. The memory profile was approximately 97 MiB resident memory size, and 43 GiB virtual memory size. This suggests that #8543 was successful in eliminating the egregious memory leak, but there is some other behavior that is causing unbounded growth in the virtual process size. This isn't quite as bad as before (it doesn't take up all the resident size and crash other processes on the system) but it still reaches a point where the kernel won't fork the process any more (likely due to the page table growing too large).

Requires further investigation. One possible suspect is threading; node manager creates and discards a huge number of threads, if each one bumps up the virtual size by a little bit, it would add up. If this seems to be the case, consider a thread polling solution to re-use threads.

Update: further research suggests this might be a problem with Python 2.7 memory management, suggests upgrading to Python 3.3:

https://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html

Another thing to try is to set "export MALLOC_ARENA_MAX=1" which tells glibc to use per-thread memory pools.


Related issues

Related to Arvados - Bug #8686: [Node Manager] qr1hi nodemanager can't start if ulimit is in place Closed03/14/2016Actions
Actions #1

Updated by Peter Amstutz about 8 years ago

  • Subject changed from [Node Manager] huge virtual size to [Node Manager] huge virtual memory size
Actions #2

Updated by Peter Amstutz about 8 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 8 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 8 years ago

  • Description updated (diff)
Actions #5

Updated by Tom Morris about 7 years ago

  • Target version set to 2017-03-15 sprint
Actions #6

Updated by Peter Amstutz about 7 years ago

  • Assigned To set to Peter Amstutz
Actions #7

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-03-15 sprint to Arvados Future Sprints
  • Assigned To deleted (Peter Amstutz)
Actions #8

Updated by Peter Amstutz over 4 years ago

  • Status changed from New to Closed
Actions #9

Updated by Ward Vandewege over 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF