https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422022-03-29T14:12:26ZArvadosArvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022122022-03-29T14:12:26ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022132022-03-29T14:13:05ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Subject</strong> changed from <i>arv-mount performance should be better</i> to <i>arv-mount/python SDK performance should be better</i></li></ul> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022292022-03-29T19:38:30ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>46</i></li></ul><p>I'm curious how the performance improves with a local keepstore server</p>
<p>Goofys uses a 400 MB readahead buffer with 20 MB chunks.</p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022452022-03-30T03:16:50ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>18941-arv-prefetch @ <a class="changeset" title="18941: Rename cache_slot_get option to 'prefetch' for clarity Arvados-DCO-1.1-Signed-off-by: Pet..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/d3051d45df5ee760a05a84bf4d385799c4326477">d3051d45df5ee760a05a84bf4d385799c4326477</a></p>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3004/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3004/">developer-run-tests: #3004 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3004" alt="" /></a></a></p>
<p>Found bugs in the prefetch logic:</p>
<p>When checking to see if a block was already in cache or should be prefetched it was:</p>
<ul>
<li>using get_from_cache() which only returns true if the block is fully downloaded, but what we care about is whether there is an in-flight GET request or not.</li>
<li>using the signed identifier, but blocks are keyed on the bare md5 sum.</li>
</ul>
<p>So, it would always think the block was not yet in cache, issuing numerous prefetch requests for the same block over and over, creating a traffic jam where all the prefetch threads were waiting on the same block instead of skipping duplicates and fetching the next required block.</p>
<p>Fixes:</p>
<ul>
<li>Add a special "prefetch" option for KeepClient.get() which (a) starts downloading like normal if the block is not cached or (b) returns immediately if the block already has a cache entry</li>
<li>This ensures the prefetch requests (a) only initiate downloads for blocks that are not yet in flight (b) does not wait for in-flight requests to complete</li>
<li>Also removed the separate check to see if the block was in the cache; this could race with other threads and still result in more than one thread waiting on the same block fetch.</li>
<li>the number of prefetch 'get' threads is now settable, and the prefetch window size is based on the number of prefetch threads</li>
<li>arv-get has new --threads option, uses 4 threads by default</li>
<li>arv-mount sets the number of prefetch threads based on the keep cache size</li>
<li>cleanup: removed an unused "is_cached" method (which could not possibly have ever been used because it contains a fatal typo)</li>
</ul> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022462022-03-30T03:53:45ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>1.7 GiB test file</p>
<table>
<tr>
<td>Time to read whole file</td>
</tr>
<tr>
<td>arv-get</td>
<td>0m19.980s</td>
</tr>
<tr>
<td>arv-mount, default 256 MiB cache, local keepstore</td>
<td>0m25.396s</td>
</tr>
<tr>
<td>arv-mount, increased 384 MiB cache, local keepstore</td>
<td>0m22.354s</td>
</tr>
</table>
<p>time arv-get --threads 4 <a href="https://arvadosapi.com/pirca-4zz18-qwyvdfzx68qok1y">pirca-4zz18-qwyvdfzx68qok1y</a>/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex /dev/null</p>
<p>time dd if=keep/by_id/pirca-4zz18-qwyvdfzx68qok1y/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex of=/dev/nul</p>
<p>Reading the file includes loading and parsing the manifest and creating the Collection object.</p>
<p>With default settings, arv-mount is using 3 threads, but arv-get is using 4. They both use (by default) a 256 MiB cache, the difference is that arv-get is guaranteed to be doing a sequential read, whereas arv-mount is being slightly more conservative to account for other read patterns. We should consider adjusting the clusterwide default value "Container.DefaultKeepCacheRAM: 268435456" (256 Mib) up to 402653184 (384 MiB).</p>
<p>Ward observed that there seems to be a cache warming effect on the S3 side. I'm seeing this too. Reading a file, unmounting it, re-mounting it, then reading it again, the measured performance goes from 130 MB/s (first time) to 180 MB/s (subsequent trials).</p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022482022-03-30T03:55:06ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Peter Amstutz</i></li></ul> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1022882022-03-30T13:34:23ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2022-03-30 Sprint</i> to <i>2022-04-13 Sprint</i></li></ul> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023492022-03-30T18:03:19ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>I made one small change increasing max threads from 6 to 7 <a class="changeset" title="18941: Tweak the prefetch thread max to 7 Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.am..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/0a9d4fa5043bd7291611c41588bdd3f0b70ede44">0a9d4fa5043bd7291611c41588bdd3f0b70ede44</a></p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023502022-03-30T18:27:19ZLucas Di Pentimalucas.dipentima@curii.com
<ul></ul><p>Some comments & questions:</p>
<ul>
<li>"<code>arv keep get</code>" documentation may need updating at <code>sdk/cli/subcommands.html</code> to reflect the new parameter.</li>
<li>I think it would be good to have proper testing for this fix, not having it allowed this bug to exist for so long. OTOH if we need to ship this ASAP, it could be done in a separate branch, wdyt?</li>
<li>As talked at spring review, I think using the CPU core number as a factor for capping the max number of threads would be convenient. Maybe doing some manual tests on a big node to validate this would be desirable? If you have already done this test and it just works on <code>arv-get</code>, how about computing the default get threads depending on the cpu cores?</li>
</ul>
<p>The rest LGTM</p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023532022-03-30T19:21:56ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Lucas Di Pentima wrote:</p>
<blockquote>
<p>Some comments & questions:</p>
<ul>
<li>"<code>arv keep get</code>" documentation may need updating at <code>sdk/cli/subcommands.html</code> to reflect the new parameter.</li>
</ul>
</blockquote>
<p>Will do.</p>
<blockquote>
<ul>
<li>I think it would be good to have proper testing for this fix, not having it allowed this bug to exist for so long. OTOH if we need to ship this ASAP, it could be done in a separate branch, wdyt?</li>
</ul>
</blockquote>
<p>Yes, we need to ship it.</p>
<p>It's very hard to write a test for. There was already a test that the very basic prefetch behavior was happening, the problem is KeepClient.get() was stubbed out. Since the core of the issue was multiple threads waiting on KeepClient.get() of the same block, the use of a stub would have still hidden the problem. I'll have to think about it.</p>
<blockquote>
<ul>
<li>As talked at spring review, I think using the CPU core number as a factor for capping the max number of threads would be convenient. Maybe doing some manual tests on a big node to validate this would be desirable? If you have already done this test and it just works on <code>arv-get</code>, how about computing the default get threads depending on the cpu cores?</li>
</ul>
</blockquote>
<p>Right now I don't have enough information. My theory is that because of the Python global interpreter lock and the additional work that the FUSE driver does (compared to arv-get), twice as many cores would have twice the lock contention, which would be worse, not better. I don't want to leave it uncapped or use a formula for which I don't have any evidence that it'll actually make it better.</p>
<blockquote>
<p>The rest LGTM</p>
</blockquote>
<p>Thanks.</p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023582022-03-30T21:13:00ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Benchmark using <code>dd if=(25 GB keep file) of=/dev/null</code></p>
<p>2.3.3</p>
<p><a class="external" href="https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-amqvijqbdoszk83#Log">https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-amqvijqbdoszk83#Log</a></p>
<p>2022-03-30T20:47:17.361472192Z 26691600151 bytes (27 GB, 25 GiB) copied, 666.32 s, 40.1 MB/s</p>
<p>2.4.0~rc1</p>
<p>On a 2 core node:</p>
<p><a class="external" href="https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-a99km88q375xpiy#Log">https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-a99km88q375xpiy#Log</a></p>
<p>2022-03-30T21:02:42.039596608Z 26691600151 bytes (27 GB, 25 GiB) copied, 222.505 s, 120 MB/s</p>
<p>On a 4 core node:</p>
<p><a class="external" href="https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-yx8wmx0qddaqxmq#Log">https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-yx8wmx0qddaqxmq#Log</a></p>
<p>2022-03-30T21:06:55.824180944Z 26691600151 bytes (27 GB, 25 GiB) copied, 152.218 s, 175 MB/s</p>
<p>On a 8 core node:</p>
<p><a class="external" href="https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-30vfuzz3x60j3if#Log">https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-30vfuzz3x60j3if#Log</a></p>
<p>2022-03-30T21:12:09.440730978Z 26691600151 bytes (27 GB, 25 GiB) copied, 137.416 s, 194 MB/s</p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023952022-04-01T14:19:37ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-1 priority-4 priority-default" href="/issues/18964">Bug #18964</a>: Write better prefetch tests</i> added</li></ul> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023972022-04-01T14:19:57ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Added separate story for testing <a class="issue tracker-1 status-1 priority-4 priority-default" title="Bug: Write better prefetch tests (New)" href="https://dev.arvados.org/issues/18964">#18964</a></p> Arvados - Bug #18941: arv-mount/python SDK performance should be betterhttps://dev.arvados.org/issues/18941?journal_id=1023982022-04-01T14:20:28ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>