Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422024-03-27T16:15:58ZArvados
Redmine Arvados - Task #21633 (New): Reviewhttps://dev.arvados.org/issues/216332024-03-27T16:15:58ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21632 (New): Reviewhttps://dev.arvados.org/issues/216322024-03-27T16:11:16ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21630 (New): Reviewhttps://dev.arvados.org/issues/216302024-03-27T16:10:40ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21629 (New): Reviewhttps://dev.arvados.org/issues/216292024-03-27T16:10:35ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21628 (New): Reviewhttps://dev.arvados.org/issues/216282024-03-27T16:10:04ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21627 (New): Reviewhttps://dev.arvados.org/issues/216272024-03-27T16:09:47ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21626 (New): Reviewhttps://dev.arvados.org/issues/216262024-03-27T16:08:43ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21624 (In Progress): Review 21598-local-keepstore-emptytrashhttps://dev.arvados.org/issues/216242024-03-27T16:08:02ZPeter Amstutzpeter.amstutz@curii.comArvados - Task #21619 (In Progress): Review 21617-fed-contenthttps://dev.arvados.org/issues/216192024-03-26T14:10:39ZTom Cleggtom@curii.comArvados - Bug #21617 (In Progress): Timeout error reading content from collection on a remote clu...https://dev.arvados.org/issues/216172024-03-25T14:43:50ZTom Cleggtom@curii.com
In a 3-way federation with login cluster z1111:
<ul>
<li>a collection stored on z1111 can be read from z2222 (e.g., workbench.z2222/collections/z1111-4zz18-...)</li>
<li>a collection stored on z2222 cannot be read from z1111 (timeout)</li>
<li>a collection stored on z2222 cannot be read from z3333 (timeout)</li>
</ul>
<p>It looks like the intermediate cluster's keepstore process cannot retrieve the list of keep services from the cluster where the data is stored ("failed to validate remote token") -- this auto-retries in the background for a while, then eventually blockReadRemote gives up.</p>
<p>Manual testing, with jutro/tordo/pirca playing the roles of z1111/z2222/z3333, indicates the same problem existed before and after <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> was merged and deployed to tordo.</p> Arvados - Feature #21606 (In Progress): configurable keep-web output buffer to reduce delay betwe...https://dev.arvados.org/issues/216062024-03-19T03:59:41ZTom Cleggtom@curii.com
<p>According to <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Go FileSystem / FUSE mount supports block prefetch (Closed)" href="https://dev.arvados.org/issues/18961">#18961</a>, now that <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> has reduced the TTFB for fetching a block, predicting and pre-fetching the next block appears to be more complex than it's worth.</p>
<p>Instead, in a typical scenario where the backend (keepstore→keep-web) bandwidth is faster than the frontend (keep-web→client), keep-web can reduce or eliminate the between-block delay by writing to an asynchronous output buffer. While keep-web is waiting a few milliseconds for the next block to start arriving from the backend, the client continues to receive the data that has accumulated in the output buffer.</p>
<p>The size of the output buffer should be configurable.</p> Arvados - Bug #21601 (In Progress): fpm virtualenv packages not using branch versions for depende...https://dev.arvados.org/issues/216012024-03-15T20:38:09ZPeter Amstutzpeter.amstutz@curii.com
<p><a class="external" href="https://dev.arvados.org/issues/19744#note-30">https://dev.arvados.org/issues/19744#note-30</a></p>
<p>The python3-arvados-cwl-runner_2.8.0~dev20240314145937-1_amd64.deb package has arvados-python-client 2.7.1 and crunchstat-summary 2.7.1, when it should have the dev versions from the same commit.</p>
<p>I went back and looked at earlier packages: python3-arvados-cwl-runner_2.7.1~rc3-1_amd64.deb has arvados-python-client 2.7.1rc3 (as expected) and python3-arvados-cwl-runner_2.7.0~dev20230908133938-1_amd64.deb has arvados-python-client 2.7.0.dev20230908133938 (also as expected).</p>
<p>My current theory is that this behavior got lost in the changes made in 20846-package-build-fixes, but I need to find out how it worked before.</p> Arvados - Bug #21598 (In Progress): Local keepstore invoked by crunch-run should never do EmptyTr...https://dev.arvados.org/issues/215982024-03-15T18:32:48ZTom Cleggtom@curii.com
<p>We don't want N compute nodes periodically checking expiry times on all of the trashed blocks on all backend volumes.</p> Arvados - Task #21555 (In Progress): Review 21541-arv-mount-keyerror-rebasehttps://dev.arvados.org/issues/215552024-02-28T17:03:32ZPeter Amstutzpeter.amstutz@curii.comArvados - Bug #21541 (In Progress): arv-mount KeyError during cap_cache - Seemingly lost track of...https://dev.arvados.org/issues/215412024-02-26T19:01:27ZBrett Smithbrett.smith@curii.com
<p>User's arv-mount process crashed with this traceback. Afterward trying to list files in the mount root returned EIO.</p>
<pre>2024-02-23 23:36:17 arvados.arvados_fuse[2803055] ERROR: Unhandled exception during FUSE operation
Traceback (most recent call last):
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 327, in catch_exceptions_wrapper
return orig_func(self, *args, **kwargs)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 570, in lookup
self.inodes.touch(p)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 276, in touch
self.inode_cache.touch(entry)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 234, in touch
self.manage(obj)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 228, in manage
self.cap_cache()
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 212, in cap_cache
self._remove(ent, True)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 186, in _remove
obj.kernel_invalidate()
File "venv/lib/python3.10/site-packages/arvados_fuse/fusedir.py", line 220, in kernel_invalidate
parent = self.inodes[self.parent_inode]
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 260, in __getitem__
return self._entries[item]
KeyError: 865
</pre>
<p>This exact same traceback appeared seven times in one second. It's not clear whether that's multiple threads running into the same issue, or the error recurring because of different accesses.</p>
<p>Note this mount is intentionally accessible to multiple users on the host. You can assume there was concurrent access. Unfortunately for the same reason it's hard to know whether a specific operation caused the error.</p>