https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422018-10-10T15:44:30ZArvadosArvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=676042018-10-10T15:44:30ZTom Cleggtom@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" href="/issues/14324">Feature #14324</a>: [crunch-dispatch-cloud] Azure driver</i> added</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=676062018-10-10T15:44:33ZTom Cleggtom@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/13964">Bug #13964</a>: crunch-dispatch-cloud spike</i> added</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=676082018-10-10T15:44:39ZTom Cleggtom@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed" href="/issues/13908">Idea #13908</a>: [Epic] Replace SLURM for cloud job scheduling/dispatching</i> added</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=676182018-10-10T16:59:04ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/67618/diff?detail_id=64666">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=676192018-10-10T17:08:20ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/67619/diff?detail_id=64667">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=676202018-10-10T17:31:43ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/67620/diff?detail_id=64668">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=677902018-10-16T15:07:34ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/67790/diff?detail_id=64810">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=678162018-10-17T05:59:03ZTom Cleggtom@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed parent" href="/issues/14360">Idea #14360</a>: [crunch-dispatch-cloud] Merge incomplete implementation</i> added</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=688472018-11-14T21:29:09ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/68847/diff?detail_id=65944">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=688712018-11-15T19:31:04ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/68871/diff?detail_id=65972">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=688752018-11-15T20:20:48ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/68875/diff?detail_id=65977">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=689382018-11-19T15:05:17ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/68938/diff?detail_id=66054">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=696372018-12-07T22:02:42ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/69637/diff?detail_id=66724">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=697502018-12-12T16:46:40ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> set to <i>To Be Groomed</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=697752018-12-12T21:23:44ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/69775/diff?detail_id=66868">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=699132018-12-17T14:53:21ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/69913/diff?detail_id=66979">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700242018-12-18T20:53:23ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70024/diff?detail_id=67113">diff</a>)</li><li><strong>Target version</strong> deleted (<del><i>To Be Groomed</i></del>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700472018-12-19T15:43:41ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70047/diff?detail_id=67140">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700502018-12-19T15:51:11ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70050/diff?detail_id=67141">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700512018-12-19T15:55:06ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70051/diff?detail_id=67142">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700562018-12-19T17:56:55ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70056/diff?detail_id=67144">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700582018-12-19T18:07:26ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70058/diff?detail_id=67146">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700592018-12-19T18:15:19ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> set to <i>Arvados Future Sprints</i></li><li><strong>Story points</strong> set to <i>4.0</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700772018-12-20T17:55:27ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70077/diff?detail_id=67166">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700792018-12-20T18:08:52ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Management APIs should return {"items": [...]} not {"Items": [...]} for consistency with the Arvados API.</p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700802018-12-20T18:09:26ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70080/diff?detail_id=67167">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700822018-12-20T18:39:25ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70082/diff?detail_id=67168">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700832018-12-20T18:42:18ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70083/diff?detail_id=67169">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700982018-12-21T20:38:55ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/70098/diff?detail_id=67186">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=700992018-12-21T20:39:55ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=701652019-01-02T16:21:52ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> changed from <i>Arvados Future Sprints</i> to <i>2019-01-16 Sprint</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=706902019-01-16T16:14:32ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2019-01-16 Sprint</i> to <i>2019-01-30 Sprint</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=709042019-01-25T21:55:02ZTom Cleggtom@curii.com
<ul></ul><blockquote>
<ul>
<li>Ops mechanism for draining a node (e.g., curl command using a management token) -- see <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Dispatching_containers_to_cloud_VMs#Management-API">Dispatching containers to cloud VMs</a> "Management API"</li>
</ul>
</blockquote>
<p>Added "hold" and "drain". (Wiki also mentions a "kill" API -- not included here.)</p>
<blockquote>
<ul>
<li>Resource consumption metrics (number of instances, number of containers running, total hourly price of all existing VMs) -- see <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Dispatching_containers_to_cloud_VMs#Metrics">Dispatching containers to cloud VMs</a> "Metrics"</li>
</ul>
</blockquote>
<p>Added total hourly price. The others were already in place.</p>
<blockquote>
<ul>
<li>Drain (not kill) instances that exist at startup, fail boot probe, but are already running containers -- see <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Dispatching_containers_to_cloud_VMs#Special-cases-synchronizing-state">Dispatching containers to cloud VMs</a> "Special cases / synchronizing state"</li>
</ul>
</blockquote>
<p>Added what the wiki says, which is a little different:</p>
<p>"...instances are left alive at least until the containers finish. After that, the usual rules apply: if boot probe succeeds before boot timeout, start scheduling containers; otherwise, shut down."</p>
<p>This is a bit more consistent since it's more consistent with the "inherited node is <em>not</em> running a container and fails boot probe" case: we allow the boot timeout to run out before killing it, rather than expecting its boot probe to succeed before the existing container finishes.</p>
<blockquote>
<ul>
<li>Configurable port number for connecting to VM SSH servers</li>
</ul>
</blockquote>
<p>CloudVMs→SSHPort can be given as a name ("ssh") or number ("22").</p>
<blockquote>
<ul>
<li>Pass API host and dispatcher's token to crunch-run command via <code>ARVADOS_API_*</code> environment variables</li>
</ul>
</blockquote>
<p>Added.</p>
<blockquote>
<ul>
<li>Test SSH host key verification (dispatcher's token is not sent to a remote host unless the host's SSH key passes the VerifyHostKey() method provided by the cloud driver)</li>
</ul>
</blockquote>
<p>Added.</p>
<blockquote>
<ul>
<li>Test container.Queue using real railsAPI/controller</li>
</ul>
</blockquote>
<p>Added. Revealed & fixed SDK bug, see <a class="changeset" title="14325: Fix dropped request params when body not specified by caller. Arvados-DCO-1.1-Signed-off-..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/f696f142eb5dcc2b5daac56ea38f457c4106a8a7">f696f142e</a>.</p>
<blockquote>
<ul>
<li>Test resuming state after restart (some instances are booting, some idle, some running containers, some draining, some on admin-hold)</li>
</ul>
</blockquote>
<p>Added restart/resume test to confirm "hold" and instance-type labels are saved/loaded effectively.</p>
<p>Added a slew of worker tests to confirm proper state changes in probeAndUpdate.</p>
<blockquote>
<ul>
<li>Cancel container after some number of start/requeue cycles (i.e., <code>crunch-run --detach</code> succeeded, but child exited without moving container past Locked state)</li>
</ul>
</blockquote>
<p>Didn't do this. (We've already implemented it on the API side.)</p>
<blockquote>
<ul>
<li>Cancel container with no suitable instance type</li>
</ul>
</blockquote>
<p>Added.</p>
<blockquote>
<ul>
<li>Enable package build</li>
</ul>
</blockquote>
<p>Uncommented.</p>
<blockquote>
<ul>
<li>Handle cloud API ratelimit errors (obey holdoff time returned by driver... incl. test)</li>
</ul>
</blockquote>
<p>Added.</p>
<blockquote>
<ul>
<li>Update management API response format (lowercase keys)</li>
</ul>
</blockquote>
<p>Updated.</p>
<blockquote>
<ul>
<li>Confirm all probe failures are logged once instance is booted (see <a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="Idea: [crunch-dispatch-cloud] Merge incomplete implementation (Resolved)" href="https://dev.arvados.org/issues/14360#note-38">#14360#note-38</a>, fixed in <a class="changeset" title="14360: Fix error log level on first probe after boot. Arvados-DCO-1.1-Signed-off-by: Tom Clegg <..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/7a047d8b6fb8f5a1e0a0749d45a9e99b34a5779c">7a047d8b6</a>)</li>
</ul>
</blockquote>
<p>Confirmed.</p>
<p>14325-dispatch-cloud @ <a class="changeset" title="14325: Don't count busy workers with state=Unknown as Unallocated. Arvados-DCO-1.1-Signed-off-by..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b105602902e38f18a48505e2091ffea77b2c7c89">b105602902e38f18a48505e2091ffea77b2c7c89</a> <a class="external" href="https://ci.curoverse.com/view/Developer/job/developer-run-tests/1040/">https://ci.curoverse.com/view/Developer/job/developer-run-tests/1040/</a></p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=709232019-01-28T14:46:58ZTom Cleggtom@curii.com
<ul></ul><p>Now at <a class="changeset" title="14325: Clean up test suite logging. Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgene..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/a27b2bf3e33a80213a42dcf1e01144209eb2603a">a27b2bf3e</a> with some test cleanup (move LameInstanceSet's one remaining useful feature to StubDriver and retire LameInstanceSet).</p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710042019-01-30T16:12:33ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> changed from <i>2019-01-30 Sprint</i> to <i>2019-02-13 Sprint</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710162019-01-30T16:23:18ZTom Cleggtom@curii.com
<ul><li><strong>Story points</strong> changed from <i>4.0</i> to <i>1.0</i></li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710332019-01-30T16:48:15ZTom Cleggtom@curii.com
<ul><li><strong>Precedes</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed parent" href="/issues/14796">Idea #14796</a>: [crunch-dispatch-cloud] Document installation / migration from c-d-slurm + node manager</i> added</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710782019-01-31T22:35:37ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>worker.shutdownIfIdle():<br /><pre>
if !(wkr.state == StateIdle || (wkr.state == StateBooting && wkr.idleBehavior == IdleBehaviorDrain)) {
return false
}
</pre></p>
<p>The double-negative logic (do nothing if these things are NOT true...) makes this expression hard to read. Please add comments clarifying the intention that we want to shut down when certain things are true.</p>
<pre>
if wkr.idleBehavior != IdleBehaviorDrain && age < wkr.wp.timeoutIdle {
return false
}
</pre>
<p>Same comment about confusing expression. If I'm understanding the intended behavior, it would be clearer to write <code>wkr.idleBehavior == IdleBehaviorRun && age < wkr.wp.timeoutIdle</code> because the IdleBehaviorHold case has already been eliminated, and IdleBehaviorDrain ignores the timeout (but having IdleBehaviorDrain and timeoutIdle appear on the same line implies they are related).</p>
<p>Queue.Update():<br /><pre>
if _, keep := cq.dontupdate[uuid]; keep {
continue
}
...
if _, keep := cq.dontupdate[uuid]; keep {
continue
} else if _, keep = next[uuid]; keep {
continue
} else {
delete(cq.current, uuid)
}
</pre></p>
<p>Comment from last time that "keeplocal" was confusing and was renamed to "dontupdate" but there's still a few local variables called "keep" and I don't know how to read it. Should those also be called "dontupdate"? Maybe add some comments?</p>
<p>In <code>Queue.addEnt()</code> there's an embedded assumption that if the current dispatcher can't find a instance type for a container, nobody can, so it should always cancel the container (even if it has to lock it first). I think that's fine (heterogeneous dispatchers has complexity we don't want to get into yet, if ever) but should probably be mentioned in a comment.</p>
<p>worker.probeAndUpdate():<br /><pre>
for _, uuid := range ctrUUIDs {
running[uuid] = struct{}{}
if _, ok := wkr.running[uuid]; !ok {
changed = true
}
}
</pre></p>
<p>Another place that would benefit from some more comments expressing the intent / context of the code. I think what this is doing is determining if there a container UUID was found on the node which isn't present in <code>wkr.running</code>. The next block looks like it checks the opposite case where a container is known to <code>wkr.running</code> but not present on the instance.</p>
<pre>
if wkr.state == StateUnknown || wkr.state == StateBooting {
wkr.state = StateIdle
}
</pre>
<p>It is implied by getting to this point in the code that probeBooted() and probeRunning() both passed successfully, could use a comment making that assumption explicit.</p>
<p>... to be continued ....</p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710822019-02-01T15:36:37ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Pool.Unallocated():<br /><pre>
if !(wkr.state == StateIdle || wkr.state == StateBooting || wkr.state == StateUnknown) || wkr.idleBehavior != IdleBehaviorRun || len(wkr.running) > 0 {
continue
}
</pre></p>
<p>This line is way too long.</p>
<p>Similar comment to earlier about hard-to-follow double-negative logic. Here <code>!(wkr.state StateIdle || wkr.state StateBooting || wkr.state == StateUnknown)</code> is much clearer written as <code>(wkr.state != StateIdle && wkr.state != StateBooting && wkr.state != StateUnknown)</code></p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710832019-02-01T16:18:18ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><blockquote><blockquote>
<p>Cancel container after some number of start/requeue cycles (i.e., crunch-run --detach succeeded, but child exited without moving container past Locked state)</p>
</blockquote></blockquote>
<blockquote>
<p>Didn't do this. (We've already implemented it on the API side.)</p>
</blockquote>
<p>We've agreed to do so, but haven't actually done it yet (<a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: [API] Limit number of lock/unlock cycles for a given container (Resolved)" href="https://dev.arvados.org/issues/11561">#11561</a>)</p>
<pre>
# git.curoverse.com/arvados.git/lib/dispatchcloud/container
./queue_test.go:38:17: undefined: test
./queue_test.go:95:17: undefined: test
FAIL git.curoverse.com/arvados.git/lib/dispatchcloud/container [build failed]
</pre>
<pre>
import (
"github.com/julienschmidt/httprouter"
)
</pre>
<p>What's the goal of introducing yet another routing framework here? We already use both http.ServeMux and gorilla/mux.</p>
<pre>
# Layouter fails if we add these
</pre>
<p>Maybe use graphviz instead? (Requires slightly different notation).</p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=710922019-02-01T21:47:02ZTom Cleggtom@curii.com
<ul></ul><p>Indeed, those are some confusing conditional expressions, thanks. Clarified and added comments.</p>
<p>Peter Amstutz wrote:</p>
<blockquote><blockquote>
<p>Didn't do this. (We've already implemented it on the API side.)</p>
</blockquote>
<p>We've agreed to do so, but haven't actually done it yet (<a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: [API] Limit number of lock/unlock cycles for a given container (Resolved)" href="https://dev.arvados.org/issues/11561">#11561</a>)</p>
</blockquote>
<p>Ah, indeed.</p>
<blockquote>
<p>What's the goal of introducing yet another routing framework here? We already use both http.ServeMux and gorilla/mux.</p>
</blockquote>
<p>Cheap, easy to use, does what we need (filter on methods + extract path params), didn't think of a reason not to use it. (It happens to be much more efficient with time and memory than gorilla, not that that's a big concern here.)</p>
<blockquote>
<p>Maybe use graphviz instead? (Requires slightly different notation).</p>
</blockquote>
<p>Maybe. I didn't find this ascii art exercise particularly rewarding. If I were to sink more time into different ways of doing this, I'd probably just give up and make a drawing in Google Drive.</p>
<p>lib/dispatchcloud/container tests are fixed.</p>
<p>14325-dispatch-cloud @ <a class="changeset" title="14325: Merge branch 'master' Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/71fd4da18b22100682ae7e2079aadfd66360d310">71fd4da18b22100682ae7e2079aadfd66360d310</a> <a class="external" href="https://ci.curoverse.com/view/Developer/job/developer-run-tests/1051/">https://ci.curoverse.com/view/Developer/job/developer-run-tests/1051/</a></p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=711192019-02-04T22:17:40ZTom Cleggtom@curii.com
<ul><li><strong>Precedes</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed parent" href="/issues/14807">Idea #14807</a>: [arvados-dispatch-cloud] Features/fixes needed before first production deploy</i> added</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=711202019-02-04T22:18:31ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/71120/diff?detail_id=68175">diff</a>)</li></ul> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=711402019-02-05T22:27:33ZTom Cleggtom@curii.com
<ul></ul><p>Added one more fix that wasn't mentioned here: Log stderr from last boot-probe when giving up on boot.</p>
<p>14325-dispatch-cloud @ <a class="changeset" title="14325: Log stderr from last boot-probe when giving up on boot. Remove duplicate log message afte..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/ee53a267ded17bc50eaf4dfebba5ff4a3273753c">ee53a267ded17bc50eaf4dfebba5ff4a3273753c</a> <a class="external" href="https://ci.curoverse.com/view/Developer/job/developer-run-tests/1053/">https://ci.curoverse.com/view/Developer/job/developer-run-tests/1053/</a></p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=711442019-02-06T18:00:12ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>Added one more fix that wasn't mentioned here: Log stderr from last boot-probe when giving up on boot.</p>
<p>14325-dispatch-cloud @ <a class="changeset" title="14325: Log stderr from last boot-probe when giving up on boot. Remove duplicate log message afte..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/ee53a267ded17bc50eaf4dfebba5ff4a3273753c">ee53a267ded17bc50eaf4dfebba5ff4a3273753c</a> <a class="external" href="https://ci.curoverse.com/view/Developer/job/developer-run-tests/1053/">https://ci.curoverse.com/view/Developer/job/developer-run-tests/1053/</a></p>
</blockquote>
<p>This LGTM, thanks.</p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=711722019-02-06T22:38:32ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset <a class="changeset" title="Merge branch '14325-dispatch-cloud' closes #14325 Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tcl..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/800139c8dee7d9a563a8a2dca9e45e283c55c22c">arvados|800139c8dee7d9a563a8a2dca9e45e283c55c22c</a>.</p> Arvados - Feature #14325: [crunch-dispatch-cloud] Dispatch containers to cloud VMs directly, without slurm or nodemanagerhttps://dev.arvados.org/issues/14325?journal_id=719372019-03-01T19:33:55ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Release</strong> set to <i>15</i></li></ul>