Project

General

Profile

Bug #8913

Updated by Nico César about 8 years ago

This happened in qr2hi:     (I don't know if this exceptions are the cause of the manager being wedged or not. )    I restarted the service and the nodes were created.  


 <pre> 
 # grep Traceback arvados-node-manager/log/main/current    -A28 
 2016-04-08_18:00:17.44134 Traceback (most recent call last): 
 2016-04-08_18:00:17.44134     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 128, in main 
 2016-04-08_18:00:17.44135       signal.pause() 
 2016-04-08_18:00:17.44136     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 90, in shutdown_signal 
 2016-04-08_18:00:17.44136       node_daemon.shutdown() 
 2016-04-08_18:00:17.44136     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/baseactor.py", line 25, in __call__ 
 2016-04-08_18:00:17.44137       self.actor_ref.tell(message) 
 2016-04-08_18:00:17.44137     File "/usr/local/lib/python2.7/dist-packages/pykka/actor.py", line 398, in tell 
 2016-04-08_18:00:17.44137       raise ActorDeadError('%s not found' % self) 
 2016-04-08_18:00:17.44137 ActorDeadError: NodeManagerDaemonActor (urn:uuid:e9844486-0662-4b73-bc46-8e64f57ac168) not found 
 2016-04-08_18:00:17.44211 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:1c85ed8e-3b54-43fb-80eb-9cd3a5a9738f) 
 2016-04-08_18:00:17.44212 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:1c85ed8e-3b54-43fb-80eb-9cd3a5a9738f) 
 2016-04-08_18:00:17.44232 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:2bce315f-39a6-4daa-9027-acd3850e742e) 
 2016-04-08_18:00:17.44239 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:2bce315f-39a6-4daa-9027-acd3850e742e) 
 2016-04-08_18:00:17.44307 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:6ea5fdd4-6cf8-4a35-bba5-d45bb64195c7) 
 2016-04-08_18:00:17.44308 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:6ea5fdd4-6cf8-4a35-bba5-d45bb64195c7) 
 2016-04-08_18:00:17.44328 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:d9b7b106-1eae-4d5d-a86d-2aac9d334035) 
 2016-04-08_18:00:17.44333 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:d9b7b106-1eae-4d5d-a86d-2aac9d334035) 
 2016-04-08_18:00:17.44562 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:1b2032d8-1698-4d07-90a2-92fc166301cd) 
 2016-04-08_18:00:17.44563 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:1b2032d8-1698-4d07-90a2-92fc166301cd) 
 2016-04-08_18:00:17.44594 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:a3f8ae7a-fd6d-4366-a5c0-9dfc99aa8672) 
 2016-04-08_18:00:17.44602 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:a3f8ae7a-fd6d-4366-a5c0-9dfc99aa8672) 
 2016-04-08_18:00:17.44660 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:aaee2dfd-366a-4025-8799-70f82053ea68) 
 2016-04-08_18:00:17.44662 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:aaee2dfd-366a-4025-8799-70f82053ea68) 
 2016-04-08_18:00:17.44685 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:ae7b61a0-a9b3-424c-98fe-be7f02f9593c) 
 2016-04-08_18:00:17.44694 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:ae7b61a0-a9b3-424c-98fe-be7f02f9593c) 
 2016-04-08_18:00:17.44744 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:670df52f-e04d-46c2-96f8-ec56cf02f833) 
 2016-04-08_18:00:17.44745 2016-04-08 18:00:17 pykka[29660] DEBUG: Stopped ComputeNodeMonitorActor (urn:uuid:670df52f-e04d-46c2-96f8-ec56cf02f833) 
 2016-04-08_18:00:17.44786 2016-04-08 18:00:17 pykka[29660] DEBUG: Unregistered ComputeNodeMonitorActor (urn:uuid:8ae760f3-6394-4838-bfa6-079f2ad8643a) 
 -- 
 2016-04-08_18:01:48.93823 Traceback (most recent call last): 
 2016-04-08_18:01:48.93823     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/__init__.py", line 281, in throttle_wrapper 
 2016-04-08_18:01:48.93824       result = orig_func(self, *args, **kwargs) 
 2016-04-08_18:01:48.93824     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/__init__.py", line 296, in sync_node 
 2016-04-08_18:01:48.93824       return self._cloud.sync_node(cloud_node, arvados_node) 
 2016-04-08_18:01:48.93825     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/gce.py", line 149, in sync_node 
 2016-04-08_18:01:48.93825       method='POST', data=metadata_req) 
 2016-04-08_18:01:48.93825     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 937, in async_request 
 2016-04-08_18:01:48.93826       response = request(**kwargs) 
 2016-04-08_18:01:48.93826     File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py", line 120, in request 
 2016-04-08_18:01:48.93826       response = super(GCEConnection, self).request(*args, **kwargs) 
 2016-04-08_18:01:48.93827     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py", line 692, in request 
 2016-04-08_18:01:48.93827       *args, **kwargs) 
 2016-04-08_18:01:48.93828     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 799, in request 
 2016-04-08_18:01:48.93828       response = responseCls(**kwargs) 
 2016-04-08_18:01:48.93828     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 145, in __init__ 
 2016-04-08_18:01:48.93829       self.object = self.parse_body() 
 2016-04-08_18:01:48.93829     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py", line 253, in parse_body 
 2016-04-08_18:01:48.93829       raise GoogleBaseError(message, self.status, code) 
 2016-04-08_18:01:48.93830 GoogleBaseError: u'Supplied fingerprint does not match current metadata fingerprint.' 
 2016-04-08_18:01:49.63636 2016-04-08 18:01:49 JobQueueMonitorActor.38234512[17035] DEBUG: sending request 
 2016-04-08_18:01:49.64174 2016-04-08 18:01:49 CloudNodeListMonitorActor.30580448[17035] DEBUG: sending request 
 2016-04-08_18:01:49.64681 2016-04-08 18:01:49 ArvadosNodeListMonitorActor.35361056[17035] DEBUG: sending request 
 2016-04-08_18:01:49.75908 2016-04-08 18:01:49 JobQueueMonitorActor.38234512[17035] DEBUG: Calculated wishlist: n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8, n1-standard-8 
 2016-04-08_18:01:49.75920 2016-04-08 18:01:49 JobQueueMonitorActor.38234512[17035] INFO: got response with 1 items in 0.124391078949 seconds, next poll at 2016-04-08 18:01:59 
 2016-04-08_18:01:49.75970 2016-04-08 18:01:49 NodeManagerDaemonActor.49c466c95e79[17035] INFO: n1-highmem-32: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 0 
 2016-04-08_18:01:49.76002 2016-04-08 18:01:49 NodeManagerDaemonActor.49c466c95e79[17035] INFO: n1-standard-32: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 0 
 2016-04-08_18:01:49.76020 2016-04-08 18:01:49 NodeManagerDaemonActor.49c466c95e79[17035] INFO: n1-highmem-16: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 0 
 2016-04-08_18:01:49.76037 2016-04-08 18:01:49 NodeManagerDaemonActor.49c466c95e79[17035] INFO: n1-standard-16: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 0 
 -- 
 </pre>

Back