https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422015-08-16T00:01:22ZArvadosArvados - Bug #6996: [Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.https://dev.arvados.org/issues/6996?journal_id=288702015-08-16T00:01:22ZWard Vandewegeward@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/28870/diff?detail_id=28271">diff</a>)</li></ul> Arvados - Bug #6996: [Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.https://dev.arvados.org/issues/6996?journal_id=288712015-08-16T00:01:35ZWard Vandewegeward@curii.com
<ul><li><strong>Subject</strong> changed from <i>[Node Manager] a node filled up its /tmp which killed slurm. Node manager did not kill the node.</i> to <i>[Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.</i></li></ul> Arvados - Bug #6996: [Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.https://dev.arvados.org/issues/6996?journal_id=288792015-08-17T14:25:00ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Remember that in the initial development of Node Manager, we decided it was best to take the conservative approach of only shutting down nodes that assert that they're idle. If nodes are in "weird" states, the expectation is that they might still be doing compute work locally, even if they're having trouble talking to SLURM or Arvados, so shutting them down risks losing the work. Remember also that Node Manager only knows what Arvados and the cloud tell it. It doesn't have any direct insight into the state of the compute node.</p>
<p>Looking at the crunch-dispatch logs, it marked compute0 in the SLURM alloc state starting at 21:19:04, then down at 00:01:20. What state did you want Node Manager to see and respond to?</p> Arvados - Bug #6996: [Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.https://dev.arvados.org/issues/6996?journal_id=288922015-08-17T21:05:18ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Target version</strong> changed from <i>Bug Triage</i> to <i>Arvados Future Sprints</i></li></ul> Arvados - Bug #6996: [Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.https://dev.arvados.org/issues/6996?journal_id=877862020-10-07T15:27:55ZWard Vandewegeward@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li></ul> Arvados - Bug #6996: [Node Manager] a node filled up its /tmp which killed slurm and docker, node unusable. Node manager did not kill the node.https://dev.arvados.org/issues/6996?journal_id=885262020-11-04T17:26:15ZWard Vandewegeward@curii.com
<ul><li><strong>Target version</strong> deleted (<del><i>Arvados Future Sprints</i></del>)</li></ul>