[Node Manager] huge virtual memory size
Recently ran into the "cannot fork, out of memory" error which required a restart of node manager. The memory profile was approximately 97 MiB resident memory size, and 43 GiB virtual memory size. This suggests that #8543 was successful in eliminating the egregious memory leak, but there is some other behavior that is causing unbounded growth in the virtual process size. This isn't quite as bad as before (it doesn't take up all the resident size and crash other processes on the system) but it still reaches a point where the kernel won't fork the process any more (likely due to the page table growing too large).
Requires further investigation. One possible suspect is threading; node manager creates and discards a huge number of threads, if each one bumps up the virtual size by a little bit, it would add up. If this seems to be the case, consider a thread polling solution to re-use threads.
Update: further research suggests this might be a problem with Python 2.7 memory management, suggests upgrading to Python 3.3:
Another thing to try is to set "export MALLOC_ARENA_MAX=1" which tells glibc to use per-thread memory pools.