Project

General

Profile

Idea #8894

Updated by Brett Smith about 8 years ago

It would be useful for ops if it were clearer that exception tracebacks represented unhandled errors in Node Manager.    For exceptions that are being handled by the code, like exceptions being retried in the RetryMixin, it would help if they were logged less verbosely.    For an example of an exception traceback that actually got handled fine: 

 Azure can return a 500:  

 <pre> 
 2016-04-02_12:03:45.58984 2016-04-02 12:03:44 ComputeNodeShutdownActor.feddab943501.compute-xue76h3ns5mmkty-qr1hi[55464] WARNING: Client error: <LibcloudError in <libcloud. 
 common.azure.AzureResponse object at 0x7f7c008ab250> 'Unknown error Status code: 500.'> - waiting 1 seconds 
 2016-04-02_12:03:45.58986 Traceback (most recent call last): 
 2016-04-02_12:03:45.58987     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/__init__.py", line 74, in retry_wrapper 
 2016-04-02_12:03:45.58988       ret = orig_func(self, *args, **kwargs) 
 2016-04-02_12:03:45.58988     File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/__init__.py", line 227, in shutdown_node 
 2016-04-02_12:03:45.58989       if not self._cloud.destroy_node(self.cloud_node): 
 2016-04-02_12:03:45.58989     File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/azure_arm.py", line 654, in destroy_node 
 2016-04-02_12:03:45.58990       node.extra["properties"]["storageProfile"]["osDisk"]["vhd"]["uri"]) 
 2016-04-02_12:03:45.58991     File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/azure_arm.py", line 1108, in _ex_delete_old_vhd 
 2016-04-02_12:03:45.58991       blob)) 
 2016-04-02_12:03:45.58992     File "/usr/local/lib/python2.7/dist-packages/libcloud/storage/drivers/azure_blobs.py", line 453, in get_object 
 2016-04-02_12:03:45.58993       response = self.connection.request(object_path, method='HEAD') 
 2016-04-02_12:03:45.58994     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 799, in request 
 2016-04-02_12:03:45.58995       response = responseCls(**kwargs) 
 2016-04-02_12:03:45.58995     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 142, in __init__ 
 2016-04-02_12:03:45.58996       message=self.parse_error(), 
 2016-04-02_12:03:45.58996     File "/usr/local/lib/python2.7/dist-packages/libcloud/common/azure.py", line 91, in parse_error 
 2016-04-02_12:03:45.58997       driver=self 
 2016-04-02_12:03:45.58998 LibcloudError: <LibcloudError in <libcloud.common.azure.AzureResponse object at 0x7f7c008ab250> 'Unknown error Status code: 500.'> 
 </pre> 

 Maybe we could log these tracebacks at a lower logging level, and ops could configure a higher logging level by default? from that moment on, nodemanager was wedged

Back