Project

General

Profile

Bug #8206

Updated by Peter Amstutz about 7 years ago

SSL error getting    @max_total_price=config.getfloat('Daemon', 'max_total_price')).proxy()@ will stop the node-manager- 


 here is the stacktrace: 
 <pre> 

 2016-01-14_12:42:41.57638 Traceback (most recent call last): 
 2016-01-14_12:42:41.57641     File "/usr/local/bin/arvados-node-manager", line 6, in <module> 
 2016-01-14_12:42:41.57643       main() 
 2016-01-14_12:42:41.57643     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 125, in main 
 2016-01-14_12:42:41.59825       max_total_price=config.getfloat('Daemon', 'max_total_price')).proxy() 
 2016-01-14_12:42:41.59827     File "/usr/local/lib/python2.7/dist-packages/pykka/actor.py", line 94, in start 
 2016-01-14_12:42:41.59827       obj = cls(*args, **kwargs) 
 2016-01-14_12:42:41.59829     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 123, in __init__ 
 2016-01-14_12:42:41.59844       self._cloud_driver = self._new_cloud() 
 2016-01-14_12:42:41.59846     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/config.py", line 105, in new_cloud_client 
 2016-01-14_12:42:41.59846       self.get_section('Cloud Create')) 
 2016-01-14_12:42:41.59847     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/gce.py", line 36, in __init__ 
 2016-01-14_12:42:41.59847       driver_class) 
 2016-01-14_12:42:41.59847     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 40, in __init__ 
 2016-01-14_12:42:41.59848       self.real = driver_class(**auth_kwargs) 
 2016-01-14_12:42:41.59848     File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py", line 1053, in __init__ 
 2016-01-14_12:42:41.59862       self.zone_list = self.ex_list_zones() 
 2016-01-14_12:42:41.59863     File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py", line 1785, in ex_list_zones 
 2016-01-14_12:42:41.59881       response = self.connection.request(request, method='GET').object 
 2016-01-14_12:42:41.59883     File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py", line 120, in request 
 2016-01-14_12:42:41.59889       response = super(GCEConnection, self).request(*args, **kwargs) 
 2016-01-14_12:42:41.59889     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py", line 698, in request 
 2016-01-14_12:42:41.59895       raise e 
 2016-01-14_12:42:41.59895 ssl.SSLError: The read operation timed out 

 </pre> 


 seems that the nodemanager after that is stuck. will be good to retry or at least die gracefully. 


 Steps to fix: 

 Put @self.real@ initialization into retry loop on cloud error. 

 Log error backtrace. 

Back