Story #8894

Updated by Brett Smith over 3 years ago

It would be useful for ops if it were clearer that exception tracebacks represented unhandled errors in Node Manager. For exceptions that are being handled by the code, like exceptions being retried in the RetryMixin, it would help if they were logged less verbosely. For an example of an exception traceback that actually got handled fine:

Azure can return a 500:

<pre>
2016-04-02_12:03:45.58984 2016-04-02 12:03:44 ComputeNodeShutdownActor.feddab943501.compute-xue76h3ns5mmkty-qr1hi[55464] WARNING: Client error: <LibcloudError in <libcloud.
common.azure.AzureResponse object at 0x7f7c008ab250> 'Unknown error Status code: 500.'> - waiting 1 seconds
2016-04-02_12:03:45.58986 Traceback (most recent call last):
2016-04-02_12:03:45.58987 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/__init__.py", line 74, in retry_wrapper
2016-04-02_12:03:45.58988 ret = orig_func(self, *args, **kwargs)
2016-04-02_12:03:45.58988 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/dispatch/__init__.py", line 227, in shutdown_node
2016-04-02_12:03:45.58989 if not self._cloud.destroy_node(self.cloud_node):
2016-04-02_12:03:45.58989 File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/azure_arm.py", line 654, in destroy_node
2016-04-02_12:03:45.58990 node.extra["properties"]["storageProfile"]["osDisk"]["vhd"]["uri"])
2016-04-02_12:03:45.58991 File "/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/azure_arm.py", line 1108, in _ex_delete_old_vhd
2016-04-02_12:03:45.58991 blob))
2016-04-02_12:03:45.58992 File "/usr/local/lib/python2.7/dist-packages/libcloud/storage/drivers/azure_blobs.py", line 453, in get_object
2016-04-02_12:03:45.58993 response = self.connection.request(object_path, method='HEAD')
2016-04-02_12:03:45.58994 File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 799, in request
2016-04-02_12:03:45.58995 response = responseCls(**kwargs)
2016-04-02_12:03:45.58995 File "/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py", line 142, in __init__
2016-04-02_12:03:45.58996 message=self.parse_error(),
2016-04-02_12:03:45.58996 File "/usr/local/lib/python2.7/dist-packages/libcloud/common/azure.py", line 91, in parse_error
2016-04-02_12:03:45.58997 driver=self
2016-04-02_12:03:45.58998 LibcloudError: <LibcloudError in <libcloud.common.azure.AzureResponse object at 0x7f7c008ab250> 'Unknown error Status code: 500.'>
</pre>

Maybe we could log these tracebacks at a lower logging level, and ops could configure a higher logging level by default? from that moment on, nodemanager was wedged

Back