https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422018-11-07T17:35:14ZArvadosArvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=684762018-11-07T17:35:14ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=684772018-11-07T17:35:38ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/68477/diff?detail_id=65532">diff</a>)</li></ul> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=684792018-11-07T17:36:20ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-3 priority-4 priority-default closed parent" href="/issues/14198">Feature #14198</a>: [CWL] run steps on remote clusters</i> added</li></ul> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=684842018-11-07T20:30:07ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Peter Amstutz</i></li></ul> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=684942018-11-07T20:57:18ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>14458-controller-panic @ <a class="changeset" title="14458: Avoid panic by removing defer close() Channels will be garbage collected when they go unr..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/925f7e4c35fb335b3f57000371592db9f2206766">925f7e4c35fb335b3f57000371592db9f2206766</a></p> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=684962018-11-07T21:42:03ZTom Cleggtom@curii.com
<ul></ul><p>With a buffered errorChan, it seems like wg.Done() and cancelFunc() can be called after the error is sent but before <-errorChan wins the select race, which means <-sharedContext.Done() can win the select race while the errors slice is empty, even though there is in fact an error to report.</p>
<p>How about making <code>errorChan:=make(chan error, len(h.handler.Cluster.RemoteClusters))</code> and, when Done(), collecting errors in a <code>for len(errorChan)>0</code> loop? Seems like we only care about the errors if we get to the "give up" case, so in the meantime accumulating them in a channel seems more convenient than accumulating them in a slice. <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: [Controller] Specify runtime_token when creating container requests on a remote cluster (Resolved)" href="https://dev.arvados.org/issues/14262#note-12">#14262#note-12</a></p> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=685122018-11-08T16:01:52ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>With a buffered errorChan, it seems like wg.Done() and cancelFunc() can be called after the error is sent but before <-errorChan wins the select race, which means <-sharedContext.Done() can win the select race while the errors slice is empty, even though there is in fact an error to report.</p>
</blockquote>
<p>Here's my reasoning:</p>
<ol>
<li>wg.Wait() unblocks to call cancelFunc() only after all goroutines have completed.</li>
<li>At that point, every goroutine that is going to produce an error will have already put its error in the buffered channel</li>
<li>In the loop, the receive on the error channel has priority over the cancellation channel</li>
<li>Therefore, the error channel will be drained before matching the <-sharedContext.Done() case</li>
</ol>
<p>... so on further reading, I see that the select statement in Go don't actually have priority between channels, so it won't work like that. Argh!</p>
<p>(So the real reason I originally wrote this code using mutexes and shared state is that it was frankly easier for me to reason about than Go channels and all their edge cases)</p>
<p>You're right, it is simpler to keep them in the buffered channel and drain it when it is done.</p>
<p>Now 14458-controller-panic @ <a class="changeset" title="14458: Drain errors channel on failure case instead of accumulating separately Arvados-DCO-1.1-S..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/6938e8cff1632d597cfbd333e4d5176805b628c6">6938e8cff1632d597cfbd333e4d5176805b628c6</a></p> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=685172018-11-08T17:36:56ZTom Cleggtom@curii.com
<ul></ul><p>With <code>for range errChan</code>, the behavior changes a bit: if the parent context cancels, the handler blocks until all of the remote-req goroutines notice that and return. (Previously, and still in the success case, the handler returns immediately, letting the remote-req goroutines run until they notice the cancelled context.) Is this deliberate? I suggested "for len(errChan)>0" because I thought the "return immediately, let goroutines run" strategy seemed sensible to use across the board -- e.g., the "request finished" log message would happen right away.</p>
<p>LGTM</p> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=685212018-11-08T18:28:51ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote:</p>
<blockquote>
<p>With <code>for range errChan</code>, the behavior changes a bit: if the parent context cancels, the handler blocks until all of the remote-req goroutines notice that and return. (Previously, and still in the success case, the handler returns immediately, letting the remote-req goroutines run until they notice the cancelled context.) Is this deliberate? I suggested "for len(errChan)>0" because I thought the "return immediately, let goroutines run" strategy seemed sensible to use across the board -- e.g., the "request finished" log message would happen right away.</p>
</blockquote>
<p>No, that wasn't deliberate, I wasn't thinking about the case of the parent context being cancelled. I see what you mean, draining the buffer but not waiting for it to close would allow it to return slightly earlier if there are outstanding requests.</p>
<blockquote>
<p>LGTM</p>
</blockquote>
<p>Changed <code>range errorChan</code> to <code>len(errorChan) > 0</code> and merged.</p> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=688082018-11-14T15:36:29ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul> Arvados - Bug #14458: [controller] collection federation panic send on closed channelhttps://dev.arvados.org/issues/14458?journal_id=694562018-12-03T22:45:16ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Release</strong> set to <i>14</i></li></ul>