https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422015-10-28T14:18:33ZArvadosArvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=318412015-10-28T14:18:33ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/31841/diff?detail_id=31255">diff</a>)</li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=318422015-10-28T14:27:39ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/31842/diff?detail_id=31256">diff</a>)</li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=318672015-10-28T15:45:19ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> set to <i>2015-11-11 sprint</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=319092015-10-28T18:59:14ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> changed from <i>2015-11-11 sprint</i> to <i>Arvados Future Sprints</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=319622015-10-29T15:31:53ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Both Peter and I separately went over the code for the libcloud Azure ARM driver, and the Node Manager Azure driver, and convinced ourselves that the timeout is correctly being passed through from there the base libcloud connection class. A timeout is properly specified in our Azure cluster configuration files, too. So if the issue here is timeout-related, it's something more nuanced than just "not setting it."</p> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=324792015-11-17T14:16:39ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>Arvados Future Sprints</i></del>)</li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=344252016-01-19T17:52:49ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> set to <i>Arvados Future Sprints</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=348842016-02-02T19:31:47ZTom Cleggtom@curii.com
<ul></ul><p>Peter suggested a good first step would be adding/improving logging statements, so next time this happens we get more clues.</p> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=348852016-02-02T19:33:21ZTom Cleggtom@curii.com
<ul><li><strong>Subject</strong> changed from <i>[NodeManager] CloudNodeListMonitorActor stopped reporting</i> to <i>[NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnose</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=348862016-02-02T19:33:51ZTom Cleggtom@curii.com
<ul><li><strong>Story points</strong> set to <i>1.0</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=349782016-02-03T20:27:27ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2016-02-17 Sprint</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=349792016-02-03T20:27:39ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Peter Amstutz</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=350502016-02-05T16:32:34ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Sample log output with the node manager on this branch:</p>
<pre>
2016-02-05 11:17:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.49520611763 seconds, next poll at 2016-02-05 11:18:57
2016-02-05 11:17:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 0
2016-02-05 11:17:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.24660396576 seconds, next poll at 2016-02-05 11:18:57
2016-02-05 11:18:17 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 0 items in 20.3000109196 seconds, next poll at 2016-02-05 11:18:57
2016-02-05 11:18:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:18:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:18:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 1 items in 0.10978603363 seconds, next poll at 2016-02-05 11:19:57
2016-02-05 11:18:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 1, up 0 (booting 0, idle 0, busy 0), shutting down 0
2016-02-05 11:18:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 1, up 0 (booting 0, idle 0, busy 0), shutting down 0
2016-02-05 11:18:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Want 1 more Standard_D1 nodes. Booting a node.
2016-02-05 11:18:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 0.970148086548 seconds, next poll at 2016-02-05 11:19:57
2016-02-05 11:18:58 googleapiclient.discovery[1402] INFO: URL being requested: PUT https://c97qk.arvadosapi.com/arvados/v1/nodes/c97qk-7ekkf-b4ksh6y4v1lyy5i?alt=json
2016-02-05 11:18:59 ComputeNodeSetupActor.35f2bfc2f2c0[1402] INFO: Sending create_node request for node size Standard_D1.
2016-02-05 11:19:06 ComputeNodeSetupActor.35f2bfc2f2c0[1402] INFO: Cloud node /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-b4ksh6y4v1lyy5i-c97qk created.
2016-02-05 11:19:06 googleapiclient.discovery[1402] INFO: URL being requested: PUT https://c97qk.arvadosapi.com/arvados/v1/nodes/c97qk-7ekkf-b4ksh6y4v1lyy5i?alt=json
2016-02-05 11:19:06 ComputeNodeSetupActor.35f2bfc2f2c0[1402] INFO: <a href="https://arvadosapi.com/c97qk-7ekkf-b4ksh6y4v1lyy5i">c97qk-7ekkf-b4ksh6y4v1lyy5i</a> updated properties.
2016-02-05 11:19:06 ComputeNodeSetupActor.35f2bfc2f2c0[1402] INFO: /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-b4ksh6y4v1lyy5i-c97qk post-create work done.
2016-02-05 11:19:06 ComputeNodeSetupActor.35f2bfc2f2c0[1402] INFO: finished
2016-02-05 11:19:16 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 0 items in 19.2054839134 seconds, next poll at 2016-02-05 11:19:57
2016-02-05 11:19:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:19:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:19:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 1 items in 0.118844985962 seconds, next poll at 2016-02-05 11:20:57
2016-02-05 11:19:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 1, up 1 (booting 1, idle 0, busy 0), shutting down 0
2016-02-05 11:19:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.10505390167 seconds, next poll at 2016-02-05 11:20:57
2016-02-05 11:20:22 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 25.246958971 seconds, next poll at 2016-02-05 11:20:57
2016-02-05 11:20:22 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Registering new cloud node /subscriptions/a731f419-596b-4b64-a278-364e76506b06/resourceGroups/c97qk/providers/Microsoft.Compute/virtualMachines/compute-b4ksh6y4v1lyy5i-c97qk
2016-02-05 11:20:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:20:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:20:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 1 items in 0.153210163116 seconds, next poll at 2016-02-05 11:21:57
2016-02-05 11:20:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 1, up 1 (booting 1, idle 0, busy 0), shutting down 0
2016-02-05 11:20:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.10489988327 seconds, next poll at 2016-02-05 11:21:57
2016-02-05 11:21:21 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 23.8909299374 seconds, next poll at 2016-02-05 11:21:57
2016-02-05 11:21:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:21:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:21:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.164697170258 seconds, next poll at 2016-02-05 11:22:57
2016-02-05 11:21:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 1 (booting 1, idle 0, busy 0), shutting down 0
2016-02-05 11:21:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.03316402435 seconds, next poll at 2016-02-05 11:22:57
2016-02-05 11:21:58 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Cloud node compute-b4ksh6y4v1lyy5i-c97qk is now paired with Arvados node <a href="https://arvadosapi.com/c97qk-7ekkf-b4ksh6y4v1lyy5i">c97qk-7ekkf-b4ksh6y4v1lyy5i</a>
2016-02-05 11:22:19 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 22.2098510265 seconds, next poll at 2016-02-05 11:22:57
2016-02-05 11:22:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:22:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:22:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.264312028885 seconds, next poll at 2016-02-05 11:23:57
2016-02-05 11:22:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 1 (booting 0, idle 0, busy 1), shutting down 0
2016-02-05 11:22:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.10327005386 seconds, next poll at 2016-02-05 11:23:57
2016-02-05 11:23:26 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 29.5173959732 seconds, next poll at 2016-02-05 11:23:57
2016-02-05 11:23:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:23:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:23:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.414124965668 seconds, next poll at 2016-02-05 11:24:57
2016-02-05 11:23:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 1 (booting 0, idle 0, busy 1), shutting down 0
2016-02-05 11:23:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.3468849659 seconds, next poll at 2016-02-05 11:24:57
2016-02-05 11:24:23 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 26.1978158951 seconds, next poll at 2016-02-05 11:24:57
2016-02-05 11:24:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:24:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:24:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.101679086685 seconds, next poll at 2016-02-05 11:25:57
2016-02-05 11:24:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 1 (booting 0, idle 0, busy 1), shutting down 0
2016-02-05 11:24:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 0.968893051147 seconds, next poll at 2016-02-05 11:25:57
2016-02-05 11:25:22 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 24.9726760387 seconds, next poll at 2016-02-05 11:25:57
2016-02-05 11:25:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:25:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:25:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.121600151062 seconds, next poll at 2016-02-05 11:26:57
2016-02-05 11:25:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 1 (booting 0, idle 1, busy 0), shutting down 0
2016-02-05 11:25:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 0.876556873322 seconds, next poll at 2016-02-05 11:26:57
2016-02-05 11:26:22 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 25.2926030159 seconds, next poll at 2016-02-05 11:26:57
2016-02-05 11:26:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:26:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:26:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.129631996155 seconds, next poll at 2016-02-05 11:27:57
2016-02-05 11:26:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 1 (booting 0, idle 1, busy 0), shutting down 0
2016-02-05 11:26:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.01245999336 seconds, next poll at 2016-02-05 11:27:57
2016-02-05 11:27:00 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/discovery/v1/apis/arvados/v1/rest
2016-02-05 11:27:01 ComputeNodeShutdownActor.77891d9d8642.compute-b4ksh6y4v1lyy5i-c97qk[1402] INFO: Starting shutdown
2016-02-05 11:27:20 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 23.3203639984 seconds, next poll at 2016-02-05 11:27:57
2016-02-05 11:27:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:27:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:27:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.0978491306305 seconds, next poll at 2016-02-05 11:28:57
2016-02-05 11:27:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 0.996257066727 seconds, next poll at 2016-02-05 11:28:57
2016-02-05 11:28:21 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 1 items in 24.1674129963 seconds, next poll at 2016-02-05 11:28:57
2016-02-05 11:28:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:28:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:28:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.149247169495 seconds, next poll at 2016-02-05 11:29:57
2016-02-05 11:28:58 ArvadosNodeListMonitorActor.139670005887584[1402] INFO: got response with 258 items in 1.09818291664 seconds, next poll at 2016-02-05 11:29:57
2016-02-05 11:29:04 ComputeNodeShutdownActor.77891d9d8642.compute-b4ksh6y4v1lyy5i-c97qk[1402] INFO: Shutdown success
2016-02-05 11:29:04 googleapiclient.discovery[1402] INFO: URL being requested: PUT https://c97qk.arvadosapi.com/arvados/v1/nodes/c97qk-7ekkf-b4ksh6y4v1lyy5i?alt=json
2016-02-05 11:29:05 ComputeNodeShutdownActor.77891d9d8642.compute-b4ksh6y4v1lyy5i-c97qk[1402] INFO: finished
2016-02-05 11:29:05 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 1
2016-02-05 11:29:05 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 1
2016-02-05 11:29:22 CloudNodeListMonitorActor.139670033016992[1402] INFO: got response with 0 items in 25.129185915 seconds, next poll at 2016-02-05 11:29:57
2016-02-05 11:29:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/jobs/queue?alt=json
2016-02-05 11:29:57 googleapiclient.discovery[1402] INFO: URL being requested: GET https://c97qk.arvadosapi.com/arvados/v1/nodes?alt=json&limit=10000
2016-02-05 11:29:57 JobQueueMonitorActor.139669996440272[1402] INFO: got response with 0 items in 0.185049057007 seconds, next poll at 2016-02-05 11:30:57
2016-02-05 11:29:57 NodeManagerDaemonActor.1defb2a0d389[1402] INFO: Standard_D1: wishlist 0, up 0 (booting 0, idle 0, busy 0), shutting down 0
</pre> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=350532016-02-05T17:11:35ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Bugs fixed:</p>
<ul>
<li>ComputeNodeSetupActor and ComputeNodeShutdownActor would get stuck forever if an unhandled exception happened during node creation/shutdown. This will now result in the actor signaling an early finish to its subscribers.</li>
</ul>
<ul>
<li>ComputeNodeMonitorActor will consider shutdown after <code>cloud_node_start_time + boot_fail_after</code>, so failed nodes that were not booted in the current process still have a chance to be shut down.</li>
</ul>
<ul>
<li>Added exception handlers in other key places</li>
</ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=352502016-02-16T16:13:52ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Is this just open to go back and investgiate the original reported issue of not responding? That makes sense and it's fine if so, just checking for sprint planning purposes.</p> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=352602016-02-16T16:26:12ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=353802016-02-17T19:04:32ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Assigned To</strong> deleted (<del><i>Peter Amstutz</i></del>)</li><li><strong>Target version</strong> changed from <i>2016-02-17 Sprint</i> to <i>Arvados Future Sprints</i></li></ul> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=375022016-04-06T15:42:06ZBrett Smithbrett.smith@curii.com
<ul></ul><p>This happened again as of Node Manager 0.1.20160315133517. There was no additional information in the logs. The situation was exactly the same: CloudNodeListMonitorActor logged that it sent a request, and then was never heard from again.</p> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=389472016-05-11T17:08:41ZBrett Smithbrett.smith@curii.com
<ul></ul><p>We expect this to be mitigated by <a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="Idea: [NodeManager] Node Manager stops itself when actors stop responding (Resolved)" href="https://dev.arvados.org/issues/8236">#8236</a>.</p> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=633862018-06-12T16:09:58ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Closed</i></li></ul><p>Closing this since it is out of date.</p> Arvados - Bug #7667: [NodeManager] CloudNodeListMonitorActor stopped reporting, and logs are not helpful enough to diagnosehttps://dev.arvados.org/issues/7667?journal_id=666762018-09-12T16:50:24ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> deleted (<del><i>Arvados Future Sprints</i></del>)</li></ul>