https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422015-03-09T15:05:34ZArvadosArvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=220762015-03-09T15:05:34ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Subject</strong> changed from <i>[Crunch] Node failure</i> to <i>[Crunch] SLURM Node failure</i></li></ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=220772015-03-09T15:08:09ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/22077/diff?detail_id=21254">diff</a>)</li></ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=220782015-03-09T15:12:12ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/22078/diff?detail_id=21255">diff</a>)</li></ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=222952015-03-11T20:27:20ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>Bug Triage</i> to <i>2015-04-01 sprint</i></li></ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=222962015-03-11T20:37:46ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Set memory limit per docker container. (total memory / number of tasks)</p> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=224732015-03-18T18:51:50ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Category</strong> set to <i>Crunch</i></li><li><strong>Assigned To</strong> set to <i>Peter Amstutz</i></li></ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=224912015-03-19T14:49:25ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=226212015-03-23T15:11:16ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Some notes relating to 5425-set-docker-memory-limits</p>
<p>This uses <code>docker run --memory=xxx</code>. Based on some experimentation, it turns out this limits the resident set size of the container, but not the total virtual memory available to the application. This has a couple of implications:</p>
<ul>
<li>If the container exceeds the limit set by --memory, it won't automatically fail, it will just spill over into swap.</li>
<li>Since the goal is leave some breathing room for system process (the SLURM daemon, arv-mount, sshd, etc), instead of oversubscribing the task containers, it makes more sense to allocate (RAM / tasks) * 95% to reserve 5% of RAM for stuff outside the container.</li>
<li>If we wanted to let the tasks compete for memory but still cap the memory usage to leave room for the host processes, we would have to put all the task containers in one big parent container with a memory limit set. I'm not sure of the best way to do that.</li>
</ul> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=226312015-03-23T16:42:36ZTom Cleggtom@curii.com
<ul></ul>Conservative memory limits will cause jobs to run unnecessarily slowly:
<ul>
<li>when a worker runs 4 tasks, 3 of them finish and 1 keeps running → 3/4 RAM is free while the 1 running task is swapping.</li>
<li>when concurrent tasks actively use a lot of RAM, but not all at the same time → some RAM is idle while other tasks swap.</li>
</ul>
<p>This is suboptimal, but it is a step toward using a single worker node to run multiple [users'] jobs at the same time, which is good.</p>
<p>Combined with better reporting/graphing of swap activity, this should also help users determine the optimal memory/concurrency profile for their jobs (without going as far as killing them off when they hit some hard vsize limit).</p> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=226322015-03-23T17:00:21ZTom Cleggtom@curii.com
<ul></ul><p>The code in 5425-set-docker-memory-limits lgtm @ <a class="changeset" title="5425: Set memory limit on container as a fraction of total system memory allocated to task each w..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/c96d48a6a5c06a36fb3931c1c9650131e21d79c5">c96d48a</a>.</p>
<p>However, I've tried some tests locally and setting <code>--memory</code> seems to kill my processes when their swapped-out memory size reaches {somewhere between 1x and 2x RSS}, instead of just letting them swap. Is this expected?</p> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=226352015-03-23T17:33:16ZTom Cleggtom@curii.com
<ul></ul><p>I see. Docker 1.2 defaults to limiting swap to 2x given <code>--memory</code>, but doesn't yet have the <code>--memory-swap=-1</code> option mentioned in the <a href="https://docs.docker.com/reference/run/#runtime-constraints-on-cpu-and-memory" class="external">docs</a> that could override that.</p>
<p>I'm OK with trying the default swap limit. If users actually need more swap, I guess we'll just have to [bump docker dependencies and] add a --memory-swap=-1 there.</p> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=226362015-03-23T17:37:18ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>This is suboptimal, but it is a step toward using a single worker node to run multiple [users'] jobs at the same time, which is good.</p>
</blockquote>
<p>I agree and that's why I made a note about some kind of container-inside-container resource allocation might eventually be a better solution. However the specific problem I'm trying to solve here is that the entire node can fall over due to resource exhaustion, in which case it is probably better for the job to run slowly than to mysteriously blow up with a SLURM node failure. There is definitely more that we will want to do in to improve reporting for resource usage (like "I noticed your job has lots of major page faults, maybe you should lower the number of concurrent tasks").</p>
<blockquote>
<p>The code in 5425-set-docker-memory-limits lgtm @ <a class="changeset" title="5425: Set memory limit on container as a fraction of total system memory allocated to task each w..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/c96d48a6a5c06a36fb3931c1c9650131e21d79c5">c96d48a</a>.</p>
<p>However, I've tried some tests locally and setting <code>--memory</code> seems to kill my processes when their swapped-out memory size reaches {somewhere between 1x and 2x RSS}, instead of just letting them swap. Is this expected?</p>
</blockquote>
<p>According to <a class="external" href="https://docs.docker.com/reference/run/#runtime-constraints-on-cpu-and-memory">https://docs.docker.com/reference/run/#runtime-constraints-on-cpu-and-memory</a> it suggests that doesn't limit swap usage unless you explicitly say so, but this seems to be the Docker 1.5 documentation and we're using Docker 1.3 which appears to not have an option to limit swap usage, so I'm not clear on what the Docker 1.3 behavior is.</p> Arvados - Bug #5425: [Crunch] SLURM Node failurehttps://dev.arvados.org/issues/5425?journal_id=229752015-03-31T15:17:04ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Re-ran the original job, which completed successfully. Marking this as resolved.</p>