Project

General

Profile

Bug #8687

Updated by Nico César over 8 years ago

on version 0.1.20160310205427 we got the following exception 

 <pre> 
 nodemanager.wx7k5:/etc/sv# zgrep exception -i arvados-node-manager/log/main/@4000000056e7018b0d9856cc.s -A20 
 2016-03-14_17:57:29.63760 2016-03-14 17:57:29 pykka[32674] ERROR: Unhandled exception in NodeManagerDaemonActor (urn:uuid:8775e92b-aa32-47a9-86b6-fdf8fd9637a6): 
 2016-03-14_17:57:29.63764 Traceback (most recent call last): 
 2016-03-14_17:57:29.63765     File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 200, in _actor_loop 
 2016-03-14_17:57:29.63766       response = self._handle_receive(message) 
 2016-03-14_17:57:29.63767     File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 294, in _handle_receive 
 2016-03-14_17:57:29.63768       return callee(*message['args'], **message['kwargs']) 
 2016-03-14_17:57:29.63769     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 346, in wrapper 
 2016-03-14_17:57:29.63770       return orig_func(self, *args, **kwargs) 
 2016-03-14_17:57:29.63770     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 422, in node_can_shutdown 
 2016-03-14_17:57:29.63771       self._begin_node_shutdown(node_actor, cancellable=True) 
 2016-03-14_17:57:29.63772     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 417, in _begin_node_shutdown 
 2016-03-14_17:57:29.63773       shutdown.tell_proxy().subscribe(self._later.node_finished_shutdown) 
 2016-03-14_17:57:29.63774     File "/usr/lib/python2.7/dist-packages/pykka/proxy.py", line 161, in __getattr__ 
 2016-03-14_17:57:29.63775       raise AttributeError('%s has no attribute "%s"' % (self, name)) 
 2016-03-14_17:57:29.63777 AttributeError: <ActorProxy for ComputeNodeShutdownActor (urn:uuid:2aad9ea9-1b07-43d2-9beb-bd26d6d22686), attr_path=()> has no attribute "tell_proxy" 
 2016-03-14_17:57:29.63810 2016-03-14 17:57:29 ComputeNodeShutdownActor.bd26d6d22686.compute-9ykvyjo8btbaolg-wx7k5[32674] INFO: Draining SLURM node compute0 
 2016-03-14_17:57:29.63829 2016-03-14 17:57:29 pykka[32674] DEBUG: Unregistered NodeManagerDaemonActor (urn:uuid:8775e92b-aa32-47a9-86b6-fdf8fd9637a6) 
 2016-03-14_17:57:29.66277 2016-03-14 17:57:29 ComputeNodeShutdownActor.bd26d6d22686.compute-9ykvyjo8btbaolg-wx7k5[32674] INFO: Waiting for SLURM node compute0 to drain 
 2016-03-14_17:57:29.69269 2016-03-14 17:57:29 ComputeNodeShutdownActor.bd26d6d22686.compute-9ykvyjo8btbaolg-wx7k5[32674] INFO: Starting shutdown 
 2016-03-14_17:57:30.29366 2016-03-14 17:57:30 ArvadosNodeListMonitorActor.140560520064576[32674] INFO: got response with 393 items in 0.870521783829 seconds, next poll at 2016-03-14 17:57:39 
 2016-03-14_17:57:32.34714 2016-03-14 17:57:32 CloudNodeListMonitorActor.140563744302144[32674] INFO: got response with 1 items in 24.9102361202 seconds, next poll at 2016-03-14 17:57:17 
 -- 
 2016-03-14_18:22:26.97211 2016-03-14 18:22:26 root[32674] ERROR: Uncaught exception during setup 
 2016-03-14_18:22:26.97213 Traceback (most recent call last): 
 2016-03-14_18:22:26.97214     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 128, in main 
 2016-03-14_18:22:26.97214       signal.pause() 
 2016-03-14_18:22:26.97215     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 90, in shutdown_signal 
 2016-03-14_18:22:26.97215       node_daemon.shutdown() 
 2016-03-14_18:22:26.97216     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/baseactor.py", line 25, in __call__ 
 2016-03-14_18:22:26.97216       self.actor_ref.tell(message) 
 2016-03-14_18:22:26.97217     File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 437, in tell 
 2016-03-14_18:22:26.97217       raise _ActorDeadError('%s not found' % self) 
 2016-03-14_18:22:26.97218 ActorDeadError: NodeManagerDaemonActor (urn:uuid:8775e92b-aa32-47a9-86b6-fdf8fd9637a6) not found 

 </pre> 


 We upgraded wx7k5 qr1hi to 0.1.20160311203330 hoping that this is related to #8678 (which presumably has been fixed) but we don't know.  

 feel free to mark this ticket as duplicate if so. 

Back