Bug #8691

[Nodemanager] NodeManagerDaemonActor dies.

Added by Ward Vandewege over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Story points:
-

Description

on wx7k5 running node manager version 0.1.20160311203330-1. See @4000000056e756001b5dddf4.s for the full log.

2016-03-14_23:51:57.51077 2016-03-14 23:51:57 pykka[61889] ERROR: Unhandled exception in NodeManagerDaemonActor (urn:uuid:71f062c3-6cbd-43d0-ab25-961900d865d0):
2016-03-14_23:51:57.51079 Traceback (most recent call last):
2016-03-14_23:51:57.51080   File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 200, in _actor_loop
2016-03-14_23:51:57.51080     response = self._handle_receive(message)
2016-03-14_23:51:57.51081   File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 294, in _handle_receive
2016-03-14_23:51:57.51081     return callee(*message['args'], **message['kwargs'])
2016-03-14_23:51:57.51082   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 346, in wrapper
2016-03-14_23:51:57.51082     return orig_func(self, *args, **kwargs)
2016-03-14_23:51:57.51082   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 398, in stop_booting_node
2016-03-14_23:51:57.51083     if node.cloud_size.get().id == size.id and node.stop_if_no_cloud_node().get():
2016-03-14_23:51:57.51083   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/baseactor.py", line 63, in __getattr__
2016-03-14_23:51:57.51084     raise AttributeError('attribute "%s" is not a callable on %s' % (name, self))
2016-03-14_23:51:57.51084 AttributeError: attribute "cloud_size" is not a callable on <ActorProxy for ComputeNodeSetupActor (urn:uuid:1f47c0b6-2f9c-478d-bc26-7a4c758daae6), attr_path=()>
2016-03-14_23:51:57.51100 2016-03-14 23:51:57 pykka[61889] DEBUG: Unregistered NodeManagerDaemonActor (urn:uuid:71f062c3-6cbd-43d0-ab25-961900d865d0)

Related issues

Related to Arvados - Bug #8678: [NODEMANAGER] ComputeNodeSetupActor dies.Resolved03/10/2016

Has duplicate Arvados - Bug #8716: [NODEMANAGER] attribute "cloud_size" is not a callableDuplicate03/16/2016

Associated revisions

Revision 94b84844 (diff)
Added by Peter Amstutz over 3 years ago

Proxy objects held in node manager dict of booting nodes should be regular proxy(), not
tell_proxy(). fixes #8691

Revision d4132692 (diff)
Added by Peter Amstutz over 3 years ago

Node manager bugfix: late subscribers should get proxy() not _later (which is a tell_proxy())
fixes #8691 note-4

History

#1 Updated by Ward Vandewege over 3 years ago

  • Description updated (diff)

#2 Updated by Ward Vandewege over 3 years ago

  • Description updated (diff)

#3 Updated by Peter Amstutz over 3 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:94b848446ef76ecdebc514a3262e735a59d08e78.

#4 Updated by Ward Vandewege over 3 years ago

  • Status changed from Resolved to In Progress
  • Assigned To set to Peter Amstutz
  • Target version set to 2016-03-30 sprint

Reopening this bug because we saw it again, on node manager version 0.1.20160315133517-1 which includes the referenced commit above.

2016-03-16_21:20:20.97299 2016-03-16 21:20:20 pykka[14916] ERROR: Unhandled exception in NodeManagerDaemonActor (urn:uuid:ff5b116f-aee6-4fc4-957b-55981c940b70):
2016-03-16_21:20:20.97301 Traceback (most recent call last):
2016-03-16_21:20:20.97304   File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 200, in _actor_loop
2016-03-16_21:20:20.97305     response = self._handle_receive(message)
2016-03-16_21:20:20.97307   File "/usr/lib/python2.7/dist-packages/pykka/actor.py", line 294, in _handle_receive
2016-03-16_21:20:20.97308     return callee(*message['args'], **message['kwargs'])
2016-03-16_21:20:20.97310   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 436, in node_finished_shutdown
2016-03-16_21:20:20.97311     shutdown_actor, 'cloud_node', 'success', 'cancel_reason')
2016-03-16_21:20:20.97311   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/daemon.py", line 376, in _get_actor_attrs
2016-03-16_21:20:20.97312     return pykka.get_all([getattr(actor, name) for name in attr_names])
2016-03-16_21:20:20.97312   File "/usr/local/lib/python2.7/dist-packages/arvnodeman/baseactor.py", line 63, in __getattr__
2016-03-16_21:20:20.97313     raise AttributeError('attribute "%s" is not a callable on %s' % (name, self))
2016-03-16_21:20:20.97313 AttributeError: attribute "cloud_node" is not a callable on <ActorProxy for ComputeNodeShutdownActor (urn:uuid:0ef28cf9-cf49-48c8-8d81-5b050cdd79d3), attr_path=()>

#5 Updated by Nico César over 3 years ago

Ward Vandewege wrote:

Reopening this bug because we saw it again, on node manager version 0.1.20160315133517-1 which includes the referenced commit above.

[...]

I think this is #8678 ... which is in progress

#6 Updated by Ward Vandewege over 3 years ago

Nico Cesar wrote:

Ward Vandewege wrote:

Reopening this bug because we saw it again, on node manager version 0.1.20160315133517-1 which includes the referenced commit above.

[...]

I think this is #8678 ... which is in progress

No it's closed, and references the same 94b8484 as the fix...

#7 Updated by Peter Amstutz over 3 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:d41326922d0f3489ac4a835990be2a1e1f49da12.

Also available in: Atom PDF