https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422022-03-08T16:29:08ZArvadosArvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1014122022-03-08T16:29:08ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/101412/diff?detail_id=97969">diff</a>)</li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1019682022-03-21T15:59:19ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Is duplicate of</strong> <i><a class="issue tracker-6 status-5 priority-4 priority-default closed" href="/issues/3640">Idea #3640</a>: [SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.</i> added</li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1060022022-08-31T20:38:27ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> set to <i>2022-09-14 sprint</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1060032022-08-31T20:38:52ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Peter Amstutz</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1063412022-09-14T15:19:54ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2022-09-14 sprint</i> to <i>2022-10-12 sprint</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1066882022-09-27T18:51:57ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2022-10-12 sprint</i> to <i>2022-10-26 sprint</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1072042022-10-18T19:59:37ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>18842-arv-mount-disk-cache @ <a class="changeset" title="18842: Add disk cache, test python sdk/fuse disk_cache=true/false Off by default in the SDK but ..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/67e56f190b9a78e3c45cc7d90510fc631e0d04b6">67e56f190b9a78e3c45cc7d90510fc631e0d04b6</a></p>
<ul>
<li>Add DiskCacheSlot which takes a block, writes it to disk, then returns a memory mapped object</li>
<li>this "just works" with the existing zero copy approach based on memoryview</li>
<li>uses atomic write/rename and organizes blocks in directories the same way that keepstore does it</li>
<li>if a block already exists on disk (but not in the list of cached/mmaped blocks) it will read it and check the md5</li>
<li>sets own cache limits based on process open file descriptor limit & available disk space </li>
<li>from a warm start where the blocks exist on disk but arv-mount is freshly restarted, data is served very quickly (and avoids re-downloading blocks)</li>
<li>I originally thought of having an 4 GiB default limit, I bumped it up to 8 GiB, I'm still </li>
<li>cap_cache deletes blocks from disk that it knows about if it is exceeding the cache limits</li>
</ul>
<p>A couple details I haven't quite decided on:</p>
<ul>
<li>the cache directory is shared between processes, this enables them to share each other's blocks but each one managing space independently could also cause them to delete each other's blocks. this won't crash anything but it could be inefficient</li>
<li>right now it also doesn't do anything with the cache directory when arv-mount starts or stops, it just leaves the blocks in place, so if you access different things every time, it could still grow forever</li>
</ul>
<p>mulling this over, since I need to keep the files open anyhow, I can do something with file locks.</p>
<p>I'll follow up with some rough benchmarks</p>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3326/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3326/">developer-run-tests: #3326 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3326" alt="" /></a></a></p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1072372022-10-20T21:49:34ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>18842-arv-mount-disk-cache @ <a class="changeset" title="18842: put the file locking back We don't strictly speaking need file locking for correctness, b..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/356fa1d4d17c402470a6b49ae943e7ec278b1a46">356fa1d4d17c402470a6b49ae943e7ec278b1a46</a></p>
<ul>
<li>Maps in everything at start and prunes any excess blocks.</li>
</ul>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3329/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3329/">developer-run-tests: #3329 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3329" alt="" /></a></a></p>
<p>TODO: figure out what config.yml options need to be added & make crunch-run use them</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1072512022-10-21T17:28:33ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1072962022-10-26T14:38:43ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>18842-arv-mount-disk-config @ <a class="changeset" title="18842: Fix test Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/858a8ce2394b57540f95fa794e894e3315508c8b">858a8ce2394b57540f95fa794e894e3315508c8b</a></p>
<ul>
<li>Adds keep_cache_disk option to runtime_constraints, and DefaultKeepCacheRAM and DefaultKeepCacheDisk configurations</li>
<li>crunch-run uses disk cache if keep_cache_disk > 0, otherwise you get the RAM cache
<ul>
<li>This maintains backwards compatibility with older versions of arvados-cwl-runner. If disk cache isn't specified in the request, you get the old behavior</li>
<li>Changed the config.yml default to use disk cache</li>
</ul>
</li>
<li>Incorporates disk cache in the disk space estimate for the request</li>
<li>Updated tests</li>
</ul>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3342/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3342/">developer-run-tests: #3342 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3342" alt="" /></a></a></p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1073242022-10-26T15:17:36ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2022-10-26 sprint</i> to <i>2022-11-09 sprint</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1075092022-10-31T13:29:37ZTom Cleggtom@curii.com
<ul></ul><p>Should we make it a config error to have ram>0 and disk>0? currently in that case we reserve both, but don't use the ram.</p>
<p>The upgrade instructions don't explicitly say to update the python-arvados-fuse etc. packages on your slurm/lsf nodes -- is this worth adding? (We do mention upgrading the compute image for cloud.)</p>
<p>I'm not sure the parameterized tests are really working.</p>
<p>First, this block_cache class variable in mount_test_base.py should really be disk_cache, so it's controlled by the parameterized tests with {"disk_cache": True}, right?</p>
<pre>
class MountTestBase(unittest.TestCase):
block_cache = False
...
make_block_cache(self.block_cache)
</pre>
<p>Even with this change, disk_cache=True doesn't seem to propagate everywhere like I expected. I tried showing the block_cache arg and traceback.print_stack() in KeepClient.__init__ and found that, for example, the collection in FuseCreateFileTest.runTest() seems to be setting up its own KeepClient that uses a memory cache:</p>
<pre>
KeepClient.__init__: block_cache = None
...
File "/home/tom/arvados/services/fuse/tests/test_mount.py", line 431, in runTest
collection.save_new()
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/arvfile.py", line 469, in must_be_writable_wrapper
return orig_func(self, *args, **kwargs)
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/arvfile.py", line 270, in synchronized_wrapper
return orig_func(self, *args, **kwargs)
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/retry.py", line 177, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/collection.py", line 1732, in save_new
self._my_block_manager().commit_all()
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/arvfile.py", line 270, in synchronized_wrapper
return orig_func(self, *args, **kwargs)
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/collection.py", line 1429, in _my_block_manager
self._block_manager = _BlockManager(self._my_keep(),
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/arvfile.py", line 270, in synchronized_wrapper
return orig_func(self, *args, **kwargs)
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/collection.py", line 1420, in _my_keep
self._keep_client = KeepClient(api_client=self._api_client)
File "/home/tom/arvados/tmp/VENV3DIR/lib/python3.9/site-packages/arvados/keep.py", line 897, in __init__
traceback.print_stack()
KeepBlockCache: self._disk_cache = False
</pre>
<p>I haven't tried to figure out whether the propagation happens properly in the non-testing code, although it seems like we'd already have memory explosion problems if the regular fuse code paths caused multiple block caches to be created. Is it worth adding something like "error out if some code path instantiates the wrong kind of block cache" to the testing setup?</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1075382022-10-31T21:20:25ZTom Cleggtom@curii.com
<ul></ul><p>I see a couple of these. Is this really okay? The one in DiskCacheSlot.set() looks suspicious -- won't that result in a cache slot that behaves as if everything is OK but returns empty content?</p>
<pre><code class="python syntaxhl"> <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="n">traceback</span><span class="p">.</span><span class="n">print_exc</span><span class="p">()</span>
</code></pre>
We should probably handle errors like these at least gracefully enough to suggest switching to memory cache:
<ul>
<li>ENODEV The underlying filesystem of the specified file does not support memory mapping.</li>
<li>ENOMEM The process's maximum number of mappings would have been exceeded.</li>
</ul>
It looks like ENOSPC will cause DiskCacheSlot.set() to start leaving partially-written temp files in cache dir (as well as printing tracebacks), and the temp files won't get cleaned up until a future init_cache() hits max slots and starts deleting old files. Perhaps we could
<ul>
<li>in set(), delete the temp file if we hit an exception before renaming it into place</li>
<li>in init_cache(), delete temp files that are > 5 seconds old and aren't flocked</li>
</ul>
<p>We should use a recognizable naming pattern for tempfiles, and check that a filename matches hash or tempfile pattern before deleting it -- currently I suspect a misconfiguration like <code>--cache-dir=$HOME</code> could be silently disastrous.</p>
Defaulting to 1/2 available disk space up to 64 GiB seems excessive.
<ul>
<li>on a system with several users, 64 GiB per user is an awful lot of space for "things that went through arv-get or arv-put some time ago" (arv-get/put will use this, right?)</li>
<li>successive invocations would use smaller and smaller caches (for the single-user case we could fix this by adding the existing cache size when calculating free space)</li>
</ul>
<p>Perhaps min(1G, 0.1*total_disk_size) would be a reasonable default?</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1077992022-11-09T16:21:25ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2022-11-09 sprint</i> to <i>2022-11-23 sprint</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1082892022-11-22T22:26:03ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote in <a href="#note-13">#note-13</a>:</p>
<blockquote>
<p>Should we make it a config error to have ram>0 and disk>0? currently in that case we reserve both, but don't use the ram.</p>
</blockquote>
<p>It's sort of harmless but yea, we could make it a warning or error.</p>
<blockquote>
<p>The upgrade instructions don't explicitly say to update the python-arvados-fuse etc. packages on your slurm/lsf nodes -- is this worth adding? (We do mention upgrading the compute image for cloud.)</p>
<p>I'm not sure the parameterized tests are really working.</p>
<p>First, this block_cache class variable in mount_test_base.py should really be disk_cache, so it's controlled by the parameterized tests with {"disk_cache": True}, right?</p>
</blockquote>
<p>Yep, that was a mistake. Fixed.</p>
<blockquote>
<p>[...]</p>
<p>Even with this change, disk_cache=True doesn't seem to propagate everywhere like I expected. I tried showing the block_cache arg and traceback.print_stack() in KeepClient.__init__ and found that, for example, the collection in FuseCreateFileTest.runTest() seems to be setting up its own KeepClient that uses a memory cache:</p>
<p>[...]</p>
</blockquote>
<p>I think those cases were all in places where it was creating a Collection object to write files, so the disk cache flag didn't really matter. What I did do was add a check to the constructor to look for ThreadsafeApiCache objects and use the KeepClient from there, instead of creating a new one. So I think this ensures it is actually using the desired KeepClient object with the disk cache in those cases.</p>
<blockquote>
<p>I haven't tried to figure out whether the propagation happens properly in the non-testing code, although it seems like we'd already have memory explosion problems if the regular fuse code paths caused multiple block caches to be created. Is it worth adding something like "error out if some code path instantiates the wrong kind of block cache" to the testing setup?</p>
</blockquote>
<p>I'd have to set up some testing-only global flag that it would have to check against, I don't know if that's really necessary.</p>
<p>Tom Clegg wrote in <a href="#note-14">#note-14</a>:</p>
<blockquote>
<p>I see a couple of these. Is this really okay? The one in DiskCacheSlot.set() looks suspicious -- won't that result in a cache slot that behaves as if everything is OK but returns empty content?</p>
<p>[...]</p>
</blockquote>
<p>When ready.set() and content is 'None' that's used to mean "a cache slot was reserved for this, but no value was provided, so the read must have failed" -- in that case it will go try to read the block from a different keepstore.</p>
<blockquote>
We should probably handle errors like these at least gracefully enough to suggest switching to memory cache:
<ul>
<li>ENODEV The underlying filesystem of the specified file does not support memory mapping.</li>
<li>ENOMEM The process's maximum number of mappings would have been exceeded.</li>
</ul>
</blockquote>
<p>It now catches those errors and gives appropriate messages. In the case of a failure, it just directly falls back to RAM caching behavior (it sets self.content = value)</p>
<blockquote>
It looks like ENOSPC will cause DiskCacheSlot.set() to start leaving partially-written temp files in cache dir (as well as printing tracebacks), and the temp files won't get cleaned up until a future init_cache() hits max slots and starts deleting old files. Perhaps we could
<ul>
<li>in set(), delete the temp file if we hit an exception before renaming it into place</li>
<li>in init_cache(), delete temp files that are > 5 seconds old and aren't flocked</li>
</ul>
</blockquote>
<p>Done.</p>
<blockquote>
<p>We should use a recognizable naming pattern for tempfiles, and check that a filename matches hash or tempfile pattern before deleting it -- currently I suspect a misconfiguration like <code>--cache-dir=$HOME</code> could be silently disastrous.</p>
</blockquote>
<p>Good idea. Blocks get an extension and now it won't mess with any files that don't have that extension.</p>
<blockquote>
Defaulting to 1/2 available disk space up to 64 GiB seems excessive.
<ul>
<li>on a system with several users, 64 GiB per user is an awful lot of space for "things that went through arv-get or arv-put some time ago" (arv-get/put will use this, right?)</li>
<li>successive invocations would use smaller and smaller caches (for the single-user case we could fix this by adding the existing cache size when calculating free space)</li>
</ul>
<p>Perhaps min(1G, 0.1*total_disk_size) would be a reasonable default?</p>
</blockquote>
<p>The calculation is now the smallest of:</p>
<ul>
<li>10% of total disk size</li>
<li>25% of available space</li>
<li>max_slots * 64 MiB</li>
</ul>
<p>I added a few new tests specifically for disk caching behavior.</p>
<p>18842-arv-mount-disk-config @ <a class="changeset" title="18842: Added tests for specific disk cache behavior Arvados-DCO-1.1-Signed-off-by: Peter Amstutz..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/2cddd3c4e531e37c6d960c08da91a552bf75a1cc">2cddd3c4e531e37c6d960c08da91a552bf75a1cc</a></p>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3391/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3391/">developer-run-tests: #3391 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3391" alt="" /></a></a></p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1083062022-11-23T15:55:55ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>2022-11-23 sprint</i> to <i>2022-12-07 Sprint</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1085072022-11-25T19:03:40ZTom Cleggtom@curii.com
<ul></ul>If we handle errors by caching the block in memory, there seem to be a couple of not-too-unlikely scenarios (can't create/write cache dir, disk full) where we end up caching in memory but using a size limit that was chosen based on disk space -- either specified by caller or a default based on total/available disk space -- which will often be unrealistic for a memory cache. I'm not sure of the best way to handle this (or how close to "best" we really need to get). Thoughts:
<ul>
<li>If the cache dir is completely unusable at startup (can't even create/replace a one-byte file in DiskCacheSlot.__init__()) maybe it's better to error out right away? (if so, this seems easy)</li>
<li>If things are working but then we hit ENOSPC when we're at 12G of our 25G limit, maybe it's better to reduce the limit to 10G, evict some old blocks to get down to the new limit, and retry? (this seems a little less easy)</li>
<li>Is it feasible to set a smaller limit for the fall-back-to-memory cache -- 4 blocks, a percentage of RAM, or ...? I don't see a convenient way to do this with the current code structure but maybe you do?</li>
</ul>
<p>When disk cache fails, there's at least some chance you have enough memory to succeed with the current code, so giving up too easily would be unfortunate. OTOH, automatically escalating a variety of disk errors into OOM errors would also be unfortunate.</p>
<p>Rest LGTM.</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1085092022-11-28T14:39:58ZTom Cleggtom@curii.com
<ul></ul><blockquote>
<p>The calculation is now the smallest of:</p>
<ul>
<li>10% of total disk size</li>
<li>25% of available space</li>
<li>max_slots * 64 MiB</li>
</ul>
</blockquote>
<p>I think this means we need to add some stuff like <code>needScratch = max(needScratch, ctr.RuntimeConstraints.KeepCacheDisk*10)</code> to lib/dispatchcloud/node_size.go.</p>
<p>I think we also need to add the pre-existing cache usage to "avail" before clamping to that limit.</p>
<p>comment typo "256 GiB in RAM" should be 256 MiB (phew)</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1085152022-11-28T21:57:07ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote in <a href="#note-18">#note-18</a>:</p>
<blockquote>
<ul>
<li>If the cache dir is completely unusable at startup (can't even create/replace a one-byte file in DiskCacheSlot.__init__()) maybe it's better to error out right away? (if so, this seems easy)</li>
</ul>
</blockquote>
<p>Done, now writes a one-byte block at startup to catch early errors.</p>
<blockquote>
<ul>
<li>If things are working but then we hit ENOSPC when we're at 12G of our 25G limit, maybe it's better to reduce the limit to 10G, evict some old blocks to get down to the new limit, and retry? (this seems a little less easy)</li>
</ul>
</blockquote>
<p>Done. When it gets ENOSPC or ENOMEM, it adjusts the limit down and tries again.</p>
<blockquote>
<ul>
<li>Is it feasible to set a smaller limit for the fall-back-to-memory cache -- 4 blocks, a percentage of RAM, or ...? I don't see a convenient way to do this with the current code structure but maybe you do?</li>
</ul>
<p>When disk cache fails, there's at least some chance you have enough memory to succeed with the current code, so giving up too easily would be unfortunate. OTOH, automatically escalating a variety of disk errors into OOM errors would also be unfortunate.</p>
</blockquote>
<p>As discussed at standup, the choice is either to fall back to RAM cache behavior or raise errors. I think we decided that if things have really gone screwy, it's better to surface the error than to hide it, so that is what it does now.</p>
<blockquote>
<p>I think this means we need to add some stuff like <code>needScratch = max(needScratch, ctr.RuntimeConstraints.KeepCacheDisk*10)</code> to lib/dispatchcloud/node_size.go.</p>
</blockquote>
<p>Ah, no. The calculation I referred to was the default cache if unspecified. If you provide an explicit size (which is what <code>crunch-run</code> does) then it just uses exactly that size. It already adds KeepCacheDisk to <code>needScratch</code> in EstimateScratchSpace().</p>
<blockquote>
<p>I think we also need to add the pre-existing cache usage to "avail" before clamping to that limit.</p>
</blockquote>
<p>Good idea. Done.</p>
<blockquote>
<p>comment typo "256 GiB in RAM" should be 256 MiB (phew)</p>
</blockquote>
<p>Fixed.</p>
<p>18842-arv-mount-disk-config @ <a class="changeset" title="18842: Turn disk write errors into KeepCacheError * DiskCacheSlot properly keeps track of the fi..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/650265966e83fca4ce8e9a416e9b5e358e82be98">650265966e83fca4ce8e9a416e9b5e358e82be98</a></p>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3393/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3393/">developer-run-tests: #3393 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3393" alt="" /></a></a></p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1086782022-12-02T15:34:40ZTom Cleggtom@curii.com
<ul></ul><p>In DiskCacheSlot.evict() would it be a good idea to have an explicit self.content.close() and self.filehandle.close() instead of just assigning None?</p>
<p>In DiskCacheSlot.set(), don't we need to call self.ready.set() after assigning content = mmap() so get() doesn't wait forever? (The memory cache does so, OTOH none of the tests deadlock, so maybe I'm missing something.)</p>
<p>Rest LGTM.</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1086802022-12-02T16:16:03ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote in <a href="#note-21">#note-21</a>:</p>
<blockquote>
<p>In DiskCacheSlot.evict() would it be a good idea to have an explicit self.content.close() and self.filehandle.close() instead of just assigning None?</p>
</blockquote>
<pre>
# The mmap region might be in use when we decided to evict
# it. This can happen if the cache is too small.
#
# If we call close() now, it'll throw an error if
# something tries to access it.
#
# However, we don't need to explicitly call mmap.close()
#
# I confirmed in mmapmodule.c that that both close
# and deallocate do the same thing:
#
# a) close the file descriptor
# b) unmap the memory range
#
# So we can forget it in the cache and delete the file on
# disk, and it will tear it down after any other
# lingering Python references to the mapped memory are
# gone.
</pre>
<blockquote>
<p>In DiskCacheSlot.set(), don't we need to call self.ready.set() after assigning content = mmap() so get() doesn't wait forever? (The memory cache does so, OTOH none of the tests deadlock, so maybe I'm missing something.)</p>
</blockquote>
<p>This is done by KeepBlockCache.set() because it now re-tries if there is a failure the first time.</p>
<pre>
try:
if tryagain:
# There was an error. Evict some slots and try again.
self.cap_cache()
slot.set(blob)
finally:
# Set the notice that that we are done with the cache
# slot one way or another.
slot.ready.set()
</pre>
<p>The logic is slightly silly in that if <code>tryagain</code> is False it won't do anything but it will still process the finally block.</p>
<p>I cleaned up the control flow to hopefully be less confusing.</p>
<blockquote>
<p>Rest LGTM.</p>
</blockquote>
<p>I will merge if this passes.</p>
<p>18842-arv-mount-disk-config @ <a class="changeset" title="18842: Clean up keep cache set logic a little more Arvados-DCO-1.1-Signed-off-by: Peter Amstutz ..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/31f36e0c8aca5afe43cbe6273ecb8509fa41600b">31f36e0c8aca5afe43cbe6273ecb8509fa41600b</a></p>
<p><a class="external" href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3397/"<a href="https://ci.arvados.org/view/Developer/job/developer-run-tests/3397/">developer-run-tests: #3397 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3397" alt="" /></a></a></p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1086822022-12-02T16:19:54ZTom Cleggtom@curii.com
<ul></ul><p>Aha. So (memory-backed) CacheSlot.set() doesn't need to call self.ready.set() any more?</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1086842022-12-02T16:33:54ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Tom Clegg wrote in <a href="#note-23">#note-23</a>:</p>
<blockquote>
<p>Aha. So (memory-backed) CacheSlot.set() doesn't need to call self.ready.set() any more?</p>
</blockquote>
<p>I refactored it. Now both memory cache and disk cache call <code>self.ready.set()</code> when the cache is set successfully. The <code>KeepCache.set()</code> method will call <code>slot.set(None)</code> in the case where it repeatedly failed to set the cache slot, which will call <code>self.ready.set()</code>.</p>
<p>18842-arv-mount-disk-config @ <a class="changeset" title="18842: One more refactor Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <peter.amstutz@curii.com>" href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/1f2223927141f11055c740c9c047ba007ab81e14">1f2223927141f11055c740c9c047ba007ab81e14</a></p>
<p><a class="external" href="https://ci.arvados.org/job/developer-run-tests/3400/"<a href="https://ci.arvados.org/job/developer-run-tests/3400/">developer-run-tests: #3400 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3400" alt="" /></a></a></p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1086912022-12-02T19:18:08ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1087182022-12-05T18:03:15ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>File</strong> <a href="/attachments/3154">arvados_fuse-2.5.0.dev20221202182828.tar.gz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3154/arvados_fuse-2.5.0.dev20221202182828.tar.gz">arvados_fuse-2.5.0.dev20221202182828.tar.gz</a> added</li><li><strong>File</strong> <a href="/attachments/3153">arvados-python-client-2.5.0.dev20221202182828.tar.gz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3153/arvados-python-client-2.5.0.dev20221202182828.tar.gz">arvados-python-client-2.5.0.dev20221202182828.tar.gz</a> added</li></ul><p>Test packages -- requires both updated SDK and FUSE to be installed</p> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1090312022-12-13T15:30:35ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>47</i></li></ul> Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mounthttps://dev.arvados.org/issues/18842?journal_id=1103272023-01-26T13:56:30ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-1 priority-4 priority-default parent" href="/issues/19981">Bug #19981</a>: Containers that used an old DefaultKeepCacheRAM no longer get reused after a configuration change</i> added</li></ul>