Bug #8686

[Node Manager] qr1hi nodemanager can't start if ulimit is in place

Added by Nico César over 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
Start date:
03/14/2016
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

This is the log:

2016-03-14_17:49:01.67386 Starting arvados-node-manager from /etc/sv/arvados-node-manager
2016-03-14_17:49:02.79028 2016-03-14 17:49:02 pykka[10577] DEBUG: Registered TimedCallBackActor (urn:uuid:e724e660-9bbc-44b6-9e31-4d954f9cfd00)
2016-03-14_17:49:02.79068 2016-03-14 17:49:02 pykka[10577] DEBUG: Starting TimedCallBackActor (urn:uuid:e724e660-9bbc-44b6-9e31-4d954f9cfd00)
2016-03-14_17:49:02.79178 2016-03-14 17:49:02 root[10577] ERROR: Uncaught exception during setup
2016-03-14_17:49:02.79199 Traceback (most recent call last):
2016-03-14_17:49:02.79213   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 112, in main
2016-03-14_17:49:02.79228     launch_pollers(config, server_calculator)
2016-03-14_17:49:02.79241   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 72, in launch_pollers
2016-03-14_17:49:02.79264     timer = TimedCallBackActor.start(poll_time / 10.0).tell_proxy()
2016-03-14_17:49:02.79279   File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 99, in start
2016-03-14_17:49:02.79293     obj._start_actor_loop()
2016-03-14_17:49:02.79309   File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 367, in _start_actor_loop
2016-03-14_17:49:02.79324     thread.start()
2016-03-14_17:49:02.79335   File "/usr/lib/python2.7/threading.py", line 745, in start
2016-03-14_17:49:02.79351     _start_new_thread(self.__bootstrap, ())
2016-03-14_17:49:02.79364 error: can't start new thread

ulimit works well in other clusters: ulimit -m 3145728 -s 3145728 -l 3145728 -d 3145728 -f 10240


Related issues

Related to Arvados - Bug #8871: [Node Manager] Doesn't kill itself when unhandled exceptions are raised during actor setupNew

Related to Arvados - Bug #8798: [Node Manager] huge virtual memory sizeClosed

History

#1 Updated by Nico César over 4 years ago

  • Description updated (diff)

#2 Updated by Brett Smith over 4 years ago

  • Project changed from OPS to Arvados
  • Category set to Node Manager

#3 Updated by Brett Smith about 4 years ago

  • Subject changed from [NODEMANGER] qr1hi nodemanager can't start if ulimit is in place to [Node Manager] qr1hi nodemanager can't start if ulimit is in place

#4 Updated by Brett Smith about 4 years ago

  • Target version set to Arvados Future Sprints

Also available in: Atom PDF