Actions
Bug #8687
closed[Nodemanager] ComputeNodeShutdownActor dies.
Story points:
-
Description
on version 0.1.20160310205427 we got the following exception
nodemanager.wx7k5:/etc/sv# zgrep exception -i arvados-node-manager/log/main/@4000000056e7018b0d9856cc.s -A20 2016-03-14_17:57:29.63760 2016-03-14 17:57:29 pykka[32674] ERROR: Unhandled exception in NodeManagerDaemonActor (urn:uuid:8775e92b-aa32-47a9-86b6-fdf8fd9637a6): 2016-03-14_17:57:29.63764 Traceback (most recent call last): 2016-03-14_17:57:29.63765 File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 200, in _actor_loop 2016-03-14_17:57:29.63766 response = self._handle_receive(message) 2016-03-14_17:57:29.63767 File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 294, in _handle_receive 2016-03-14_17:57:29.63768 return callee(*message['args'], **message['kwargs']) 2016-03-14_17:57:29.63769 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 346, in wrapper 2016-03-14_17:57:29.63770 return orig_func(self, *args, **kwargs) 2016-03-14_17:57:29.63770 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 422, in node_can_shutdown 2016-03-14_17:57:29.63771 self._begin_node_shutdown(node_actor, cancellable=True) 2016-03-14_17:57:29.63772 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 417, in _begin_node_shutdown 2016-03-14_17:57:29.63773 shutdown.tell_proxy().subscribe(self._later.node_finished_shutdown) 2016-03-14_17:57:29.63774 File "/usr/lib/python2.7/dist-packages/pykka/proxy.py", line 161, in __getattr__ 2016-03-14_17:57:29.63775 raise AttributeError('%s has no attribute "%s"' % (self, name)) 2016-03-14_17:57:29.63777 AttributeError: <ActorProxy for ComputeNodeShutdownActor (urn:uuid:2aad9ea9-1b07-43d2-9beb-bd26d6d22686), attr_path=()> has no attribute "tell_proxy" 2016-03-14_17:57:29.63810 2016-03-14 17:57:29 ComputeNodeShutdownActor.bd26d6d22686.compute-9ykvyjo8btbaolg-wx7k5[32674] INFO: Draining SLURM node compute0 2016-03-14_17:57:29.63829 2016-03-14 17:57:29 pykka[32674] DEBUG: Unregistered NodeManagerDaemonActor (urn:uuid:8775e92b-aa32-47a9-86b6-fdf8fd9637a6) 2016-03-14_17:57:29.66277 2016-03-14 17:57:29 ComputeNodeShutdownActor.bd26d6d22686.compute-9ykvyjo8btbaolg-wx7k5[32674] INFO: Waiting for SLURM node compute0 to drain 2016-03-14_17:57:29.69269 2016-03-14 17:57:29 ComputeNodeShutdownActor.bd26d6d22686.compute-9ykvyjo8btbaolg-wx7k5[32674] INFO: Starting shutdown 2016-03-14_17:57:30.29366 2016-03-14 17:57:30 ArvadosNodeListMonitorActor.140560520064576[32674] INFO: got response with 393 items in 0.870521783829 seconds, next poll at 2016-03-14 17:57:39 2016-03-14_17:57:32.34714 2016-03-14 17:57:32 CloudNodeListMonitorActor.140563744302144[32674] INFO: got response with 1 items in 24.9102361202 seconds, next poll at 2016-03-14 17:57:17 -- 2016-03-14_18:22:26.97211 2016-03-14 18:22:26 root[32674] ERROR: Uncaught exception during setup 2016-03-14_18:22:26.97213 Traceback (most recent call last): 2016-03-14_18:22:26.97214 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 128, in main 2016-03-14_18:22:26.97214 signal.pause() 2016-03-14_18:22:26.97215 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 90, in shutdown_signal 2016-03-14_18:22:26.97215 node_daemon.shutdown() 2016-03-14_18:22:26.97216 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/baseactor.py", line 25, in __call__ 2016-03-14_18:22:26.97216 self.actor_ref.tell(message) 2016-03-14_18:22:26.97217 File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 437, in tell 2016-03-14_18:22:26.97217 raise _ActorDeadError('%s not found' % self) 2016-03-14_18:22:26.97218 ActorDeadError: NodeManagerDaemonActor (urn:uuid:8775e92b-aa32-47a9-86b6-fdf8fd9637a6) not found
We upgraded wx7k5 to 0.1.20160311203330 hoping that this is related to #8678 (which presumably has been fixed) but we don't know.
feel free to mark this ticket as duplicate if so.
Files
Updated by Nico César almost 9 years ago
- Subject changed from [NODEMANAGER] NodeManagerDaemonActor dies. to [NODEMANAGER] ComputeNodeShutdownActor dies.
Updated by Ward Vandewege almost 9 years ago
- Subject changed from [NODEMANAGER] ComputeNodeShutdownActor dies. to [Nodemanager] ComputeNodeShutdownActor dies.
Updated by Ward Vandewege almost 9 years ago
- Status changed from New to Resolved
- Target version set to 2016-03-16 sprint
We think this was fixed in 94b8484.
Actions