Bug #9120
closed[Node Manager] AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks'
Description
this happened in qr2hi (GPC) with version 0.1.20160421180254-1
2016-04-29_21:05:08.35533 2016-04-29 21:05:08 root[2764] ERROR: Uncaught exception during setup 2016-04-29_21:05:08.35536 Traceback (most recent call last): 2016-04-29_21:05:08.35536 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 110, in main 2016-04-29_21:05:08.35537 server_calculator = build_server_calculator(config) 2016-04-29_21:05:08.35537 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 61, in build_server_calculator 2016-04-29_21:05:08.35537 cloud_size_list = config.node_sizes(config.new_cloud_client().list_sizes()) 2016-04-29_21:05:08.35538 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/config.py", line 107, in new_cloud_client 2016-04-29_21:05:08.35538 self.get_section('Cloud Create')) 2016-04-29_21:05:08.35538 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/gce.py", line 36, in __init__ 2016-04-29_21:05:08.35539 driver_class) 2016-04-29_21:05:08.35539 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 68, in __init__ 2016-04-29_21:05:08.35539 new_pair = init_method(self.create_kwargs.pop(key)) 2016-04-29_21:05:08.35539 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/gce.py", line 51, in _init_network 2016-04-29_21:05:08.35540 network_name, 'ex_list_networks', self._name_key) 2016-04-29_21:05:08.35540 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 113, in search_for 2016-04-29_21:05:08.35541 term, list_method, key, **kwargs) 2016-04-29_21:05:08.35541 File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 95, in search_for_now 2016-04-29_21:05:08.35541 items = getattr(self, list_method)(**kwargs) 2016-04-29_21:05:08.35542 AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks' 2016-04-29_21:05:08.36902 Stopping arvados-node-manager
Updated by Nico César over 8 years ago
Updated by Brett Smith over 8 years ago
- Subject changed from AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks' to [Node Manager] AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks'
- Status changed from New to In Progress
- Assigned To set to Brett Smith
- Target version set to 2016-05-11 sprint
Updated by Brett Smith over 8 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:497fdb2505efa9a3231c39ec696da6b749d30af2.
Updated by Nico César over 8 years ago
deployed in qr2hi. Works as expected: doesn't blow up-
but it brought 2 nodes when needed only 1: https://workbench.qr2hi.arvadosapi.com/pipeline_instances/qr2hi-d1hrv-78qs9xv7ycr2j6s
new bug?
Updated by Brett Smith over 8 years ago
Nico Cesar wrote:
but it brought 2 nodes when needed only 1: https://workbench.qr2hi.arvadosapi.com/pipeline_instances/qr2hi-d1hrv-78qs9xv7ycr2j6s
new bug?
Node Manager can be in a state where it gets an updated job queue before it gets an updated node list. If the timing is just right, it can see that there's a new job in the queue, but sees the node as still busy with the previous job in the pipeline (that just finished). In that case, it will boot a new node, even though a complete snapshot of the all the system states would show it's not necessary.
This has been true forever, so it's not a "new" bug, no.
Updated by Nico César over 8 years ago
Brett Smith wrote:
Node Manager can be in a state where it gets an updated job queue before it gets an updated node list. If the timing is just right, it can see that there's a new job in the queue, but sees the node as still busy with the previous job in the pipeline (that just finished). In that case, it will boot a new node, even though a complete snapshot of the all the system states would show it's not necessary.
This has been true forever, so it's not a "new" bug, no.
Mhh... it happened 2 out of 2 times with the new version. will tests a couple more and reopen #9161 if this is the case