Bug #9120

[Node Manager] AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks'

Added by Nico César over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Brett Smith
Category:
-
Target version:
Start date:
05/02/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

this happened in qr2hi (GPC) with version 0.1.20160421180254-1

2016-04-29_21:05:08.35533 2016-04-29 21:05:08 root[2764] ERROR: Uncaught exception during setup
2016-04-29_21:05:08.35536 Traceback (most recent call last):
2016-04-29_21:05:08.35536   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 110, in main
2016-04-29_21:05:08.35537     server_calculator = build_server_calculator(config)
2016-04-29_21:05:08.35537   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/launcher.py", line 61, in build_server_calculator
2016-04-29_21:05:08.35537     cloud_size_list = config.node_sizes(config.new_cloud_client().list_sizes())
2016-04-29_21:05:08.35538   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/config.py", line 107, in new_cloud_client
2016-04-29_21:05:08.35538     self.get_section('Cloud Create'))
2016-04-29_21:05:08.35538   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/gce.py", line 36, in __init__
2016-04-29_21:05:08.35539     driver_class)
2016-04-29_21:05:08.35539   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 68, in __init__
2016-04-29_21:05:08.35539     new_pair = init_method(self.create_kwargs.pop(key))
2016-04-29_21:05:08.35539   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/gce.py", line 51, in _init_network
2016-04-29_21:05:08.35540     network_name, 'ex_list_networks', self._name_key)
2016-04-29_21:05:08.35540   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 113, in search_for
2016-04-29_21:05:08.35541     term, list_method, key, **kwargs)
2016-04-29_21:05:08.35541   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/computenode/driver/__init__.py", line 95, in search_for_now
2016-04-29_21:05:08.35541     items = getattr(self, list_method)(**kwargs)
2016-04-29_21:05:08.35542 AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks'
2016-04-29_21:05:08.36902 Stopping arvados-node-manager

Subtasks

Task #9124: Review 9120-node-manager-search-ex-methods-wipResolvedPeter Amstutz

Associated revisions

Revision 497fdb25
Added by Brett Smith over 5 years ago

Merge branch '9120-node-manager-search-ex-methods-wip'

Closes #9120, #9124.

History

#1 Updated by Nico César over 5 years ago

  • Project changed from OPS to Arvados

#3 Updated by Brett Smith over 5 years ago

  • Subject changed from AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks' to [Node Manager] AttributeError: 'ComputeNodeDriver' object has no attribute 'ex_list_networks'
  • Status changed from New to In Progress
  • Assigned To set to Brett Smith
  • Target version set to 2016-05-11 sprint

#4 Updated by Peter Amstutz over 5 years ago

LGTM

#5 Updated by Brett Smith over 5 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:497fdb2505efa9a3231c39ec696da6b749d30af2.

#6 Updated by Nico César over 5 years ago

deployed in qr2hi. Works as expected: doesn't blow up-

but it brought 2 nodes when needed only 1: https://workbench.qr2hi.arvadosapi.com/pipeline_instances/qr2hi-d1hrv-78qs9xv7ycr2j6s

new bug?

#7 Updated by Brett Smith over 5 years ago

Nico Cesar wrote:

but it brought 2 nodes when needed only 1: https://workbench.qr2hi.arvadosapi.com/pipeline_instances/qr2hi-d1hrv-78qs9xv7ycr2j6s

new bug?

Node Manager can be in a state where it gets an updated job queue before it gets an updated node list. If the timing is just right, it can see that there's a new job in the queue, but sees the node as still busy with the previous job in the pipeline (that just finished). In that case, it will boot a new node, even though a complete snapshot of the all the system states would show it's not necessary.

This has been true forever, so it's not a "new" bug, no.

#8 Updated by Nico César over 5 years ago

Brett Smith wrote:

Node Manager can be in a state where it gets an updated job queue before it gets an updated node list. If the timing is just right, it can see that there's a new job in the queue, but sees the node as still busy with the previous job in the pipeline (that just finished). In that case, it will boot a new node, even though a complete snapshot of the all the system states would show it's not necessary.

This has been true forever, so it's not a "new" bug, no.

Mhh... it happened 2 out of 2 times with the new version. will tests a couple more and reopen #9161 if this is the case

Also available in: Atom PDF