https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422022-08-23T14:42:40ZArvadosArvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1058152022-08-23T14:42:40ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> set to <i>2022-08-31 sprint</i></li></ul> Arvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1058172022-08-23T15:03:30ZTom Cleggtom@curii.com
<ul></ul><p>19414-keep-balance-panic @ <a class="changeset" title="19414: Fix concurrent map read/write. Occurred when a block was referenced by a collection but n..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/58266ff0dc0420cd99c4cb024476115a3dd9b5e7">58266ff0dc0420cd99c4cb024476115a3dd9b5e7</a> -- <a class="external" href="https://ci.arvados.org/job/developer-run-tests/3270/"<a href="https://ci.arvados.org/job/developer-run-tests/3270/">developer-run-tests: #3270 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3270" alt="" /></a></a></p>
<p>(wb1 failed)</p> Arvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1058222022-08-23T16:53:51ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Although I understand the fix, I don't get the "...when NumCPU > 2" comment, why is that necessary for the bug to happen?</p>
<p>LGTM, thanks!</p> Arvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1058242022-08-23T17:25:31ZTom Cleggtom@curii.com
<ul></ul><blockquote>
<p>when NumCPU > 2</p>
</blockquote>
<p>GetConfirmedReplication() is called from a goroutine that starts 1x per iteration of this loop in <a class="source" href="https://dev.arvados.org/projects/arvados/repository/arvados/entry/services/keep-balance/collection.go">source:services/keep-balance/collection.go</a>:</p>
<pre>
// Use about 1 goroutine per 2 CPUs. Based on experiments with
// a 2-core host, using more concurrent database
// calls/transactions makes this process slower, not faster.
for i := 0; i < runtime.NumCPU()+1/2; i++ {
</pre>
<p>...which, now that I'm paying attention, means NumCPU+0, but was surely meant to be</p>
<pre>
for i := 0; i < (runtime.NumCPU()+1)/2; i++ {
</pre>
<p>...which means that we have only a single goroutine calling GetConfirmedReplication, and therefore no opportunity for a concurrent map read/write, on a single-core machine (or a 2-core machine, after fixing the parens).</p> Arvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1058262022-08-23T19:24:37ZTom Cleggtom@curii.com
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset arvados-private:commit:arvados|69158dc93fdfec57279ba227f872f3a7c01c4e78.</p> Arvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1058542022-08-25T13:55:12ZTom Cleggtom@curii.com
<ul><li><strong>File</strong> <a href="/attachments/3058">keep-balance</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3058/keep-balance">keep-balance</a> added</li></ul><p>19414-backport-2.3.2 @ <a class="changeset" title="19414: Fix missing parens. Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/c3268207eeb57686c29a0f846bba1a7e6f135622">c3268207eeb57686c29a0f846bba1a7e6f135622</a></p> Arvados - Bug #19414: keep-balance panic: concurrent map read and map writehttps://dev.arvados.org/issues/19414?journal_id=1064492022-09-19T16:41:53ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>53</i></li></ul>