Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422024-03-19T13:15:14ZArvados
Redmine Arvados - Bug #21607 (New): arv-mount memory usage grows over timehttps://dev.arvados.org/issues/216072024-03-19T13:15:14ZPeter Amstutzpeter.amstutz@curii.com
<p>arv-mount releases metadata (collection and project listings) for files and directories that haven't been used recently to prevent unlimited memory growth.</p>
<p>Ideally it should reach a ceiling and then level off as new stuff replaces the memory used by old stuff. However, in the current version, memory usage still creeps up.</p>
<p>arv-mount would benefit from additional debugging and memory profiling to determine if there are objects being held past their intended lifetime.</p> Arvados - Feature #21606 (In Progress): configurable keep-web output buffer to reduce delay betwe...https://dev.arvados.org/issues/216062024-03-19T03:59:41ZTom Cleggtom@curii.com
<p>According to <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Go FileSystem / FUSE mount supports block prefetch (In Progress)" href="https://dev.arvados.org/issues/18961">#18961</a>, now that <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> has reduced the TTFB for fetching a block, predicting and pre-fetching the next block appears to be more complex than it's worth.</p>
<p>Instead, in a typical scenario where the backend (keepstore→keep-web) bandwidth is faster than the frontend (keep-web→client), keep-web can reduce or eliminate the between-block delay by writing to an asynchronous output buffer. While keep-web is waiting a few milliseconds for the next block to start arriving from the backend, the client continues to receive the data that has accumulated in the output buffer.</p>
<p>The size of the output buffer should be configurable.</p> Arvados - Bug #21603 (In Progress): Not recognizing subnet error returned as InvalidParameterValuehttps://dev.arvados.org/issues/216032024-03-18T14:27:16ZPeter Amstutzpeter.amstutz@curii.com
<p><code>Mar 18 03:53:48 ip-172-25-144-184 arvados-dispatch-cloud[283002]: {"ClusterID":"xxxxx","InstanceType":"r52xlarge.preemptible","PID":283002,"error":"InvalidParameterValue: Not enough free addresses in subnet subnet-0f83ca79\n\tstatus code: 400, request id: 6cbcffe1-5b77-4dee-8fbf-c20f67892c95","level":"error","msg":"create failed","time":"2024-03-18T03:53:48.927972989Z"}</code></p>
<p>This is a subnet-specific error (it should switch to the other subnet) but the current function won't recognize it as such:</p>
<pre>
func isErrorSubnetSpecific(err error) bool {
aerr, ok := err.(awserr.Error)
if !ok {
return false
}
code := aerr.Code()
return strings.Contains(code, "Subnet") ||
code == "InsufficientInstanceCapacity" ||
code == "InsufficientVolumeCapacity" ||
code == "Unsupported"
}
</pre>
<p>Because the error was unrecognized, it seems the fallback behavior seems to be to rate limit itself by setting maximum concurrent containers.</p> Arvados - Task #21602 (In Progress): Review 21601-setuptools-git-depshttps://dev.arvados.org/issues/216022024-03-17T01:06:02ZBrett Smithbrett.smith@curii.comArvados - Bug #21601 (In Progress): fpm virtualenv packages not using branch versions for depende...https://dev.arvados.org/issues/216012024-03-15T20:38:09ZPeter Amstutzpeter.amstutz@curii.com
<p><a class="external" href="https://dev.arvados.org/issues/19744#note-30">https://dev.arvados.org/issues/19744#note-30</a></p>
<p>The python3-arvados-cwl-runner_2.8.0~dev20240314145937-1_amd64.deb package has arvados-python-client 2.7.1 and crunchstat-summary 2.7.1, when it should have the dev versions from the same commit.</p>
<p>I went back and looked at earlier packages: python3-arvados-cwl-runner_2.7.1~rc3-1_amd64.deb has arvados-python-client 2.7.1rc3 (as expected) and python3-arvados-cwl-runner_2.7.0~dev20230908133938-1_amd64.deb has arvados-python-client 2.7.0.dev20230908133938 (also as expected).</p>
<p>My current theory is that this behavior got lost in the changes made in 20846-package-build-fixes, but I need to find out how it worked before.</p> Arvados - Bug #21600 (In Progress): Banner tests failing https://dev.arvados.org/issues/216002024-03-15T19:05:56ZLisa Knox
<p><a class="external" href="https://ci.arvados.org/job/developer-run-tests-services-workbench2/557/consoleFull"<a href="https://ci.arvados.org/job/developer-run-tests-services-workbench2/557/">developer-run-tests-services-workbench2: #557 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests-services-workbench2&build=557" alt="" /></a>/consoleFull</a></p>
<p>tests failing on main</p> Arvados - Bug #21598 (New): Local keepstore invoked by crunch-run should never do EmptyTrash workhttps://dev.arvados.org/issues/215982024-03-15T18:32:48ZTom Cleggtom@curii.com
<p>We don't want N compute nodes periodically checking expiry times on all of the trashed blocks on all backend volumes.</p> Arvados - Idea #21595 (New): 'shared' should use usernames, not full nameshttps://dev.arvados.org/issues/215952024-03-14T14:52:12ZPeter Amstutzpeter.amstutz@curii.com
<p>The 'shared' directory uses full names. These have a couple of problems:</p>
<ul>
<li>Always contain spaces and may have other characters that make it awkward with Unix tooling</li>
<li>Not unique. For example pirca has multiple accounts with full_name "Peter Amstutz". FUSE ends up picking one account and the other accounts just can't be accessed through FUSE.</li>
</ul>
<p>It should use 'username' instead, which is unique on a given Arvados instance.</p>
<p>I think the only question is whether it is worth the effort to maintain backwards compatibility (by making the 'username' behavior a new option) or we just change the existing behavior in place.</p>
<p>I suppose one way to do it would be to change to using usernames by default but add an option that restores the previous behavior of using full names.</p> Arvados - Task #21592 (In Progress): Review 21578-mount-debughttps://dev.arvados.org/issues/215922024-03-13T16:06:51ZPeter Amstutzpeter.amstutz@curii.comArvados - Feature #21578 (In Progress): Add debug logging option to arvados-client mounthttps://dev.arvados.org/issues/215782024-03-11T15:38:01ZTom Cleggtom@curii.com
<p>When invoked as</p>
<pre><code>arvados-client mount --log-level=debug ...</code></pre>
<p>when an error code is returned to a fuse API call ("I/O error") the original (typically much more informative) error message should also be logged to the terminal.</p> Arvados - Task #21555 (In Progress): Review 21541-arv-mount-keyerror-rebasehttps://dev.arvados.org/issues/215552024-02-28T17:03:32ZPeter Amstutzpeter.amstutz@curii.comArvados - Bug #21541 (In Progress): arv-mount KeyError during cap_cache - Seemingly lost track of...https://dev.arvados.org/issues/215412024-02-26T19:01:27ZBrett Smithbrett.smith@curii.com
<p>User's arv-mount process crashed with this traceback. Afterward trying to list files in the mount root returned EIO.</p>
<pre>2024-02-23 23:36:17 arvados.arvados_fuse[2803055] ERROR: Unhandled exception during FUSE operation
Traceback (most recent call last):
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 327, in catch_exceptions_wrapper
return orig_func(self, *args, **kwargs)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 570, in lookup
self.inodes.touch(p)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 276, in touch
self.inode_cache.touch(entry)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 234, in touch
self.manage(obj)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 228, in manage
self.cap_cache()
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 212, in cap_cache
self._remove(ent, True)
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 186, in _remove
obj.kernel_invalidate()
File "venv/lib/python3.10/site-packages/arvados_fuse/fusedir.py", line 220, in kernel_invalidate
parent = self.inodes[self.parent_inode]
File "venv/lib/python3.10/site-packages/arvados_fuse/__init__.py", line 260, in __getitem__
return self._entries[item]
KeyError: 865
</pre>
<p>This exact same traceback appeared seven times in one second. It's not clear whether that's multiple threads running into the same issue, or the error recurring because of different accesses.</p>
<p>Note this mount is intentionally accessible to multiple users on the host. You can assume there was concurrent access. Unfortunately for the same reason it's hard to know whether a specific operation caused the error.</p> Arvados - Task #21511 (In Progress): Review 21357-favorites-nameshttps://dev.arvados.org/issues/215112024-02-14T16:48:56ZPeter Amstutzpeter.amstutz@curii.comArvados - Feature #21494 (In Progress): Get Java and R SDKs out of the critical path of main bran...https://dev.arvados.org/issues/214942024-02-09T17:18:46ZBrett Smithbrett.smith@curii.com
<p>The big idea: The Java and R SDKs are neither mature nor critical enough that it makes sense to hold them to the same build standards as the rest of Arvados. They should normally not block developer-run-tests or the other jobs that follow a merge to main. Instead we can have a separate Jenkins job to run when needed (like testing a change to one of these specific SDKs) and as part of the larger release pipeline.</p>
<p>Parts of the job:</p>
<ul>
<li>In <code>doc/Rakefile</code>, consider a way to specify which SDKs you do and don't want to build docs for. We want to build the Python SDK as part of developer-run-tests, and the R SDK as part of this new Jenkins job. It would be nice if there was a switch that accepted a list of known SDKs and built what you specified.</li>
<li>Write a Jenkins job that tests the Java SDK, R SDK, and doc linkchecker after building R documentation.</li>
<li>Add this new job to multijobs and pipelines where needed, per above.</li>
<li>Reorganize developer-run-tests to remove those tests from the existing jobs. (It might make sense to do a little larger organization as part of this.)</li>
<li>Note that the doc <em>publishing</em> job (not the linkchecker test) should still build and publish all SDKs. Retaining that behavior is a requirement.</li>
</ul> Arvados - Bug #21412 (In Progress): User profile bugs on refreshhttps://dev.arvados.org/issues/214122024-01-24T18:02:15ZLisa Knox
<p>When viewing a User Profile page, if you refresh, the breadcrumbs reset to "Users > [uuid]" when previously they said either "Users > [full name]" or "Groups > [Group Name] > [full name]". When viewing another user's home project page, if you refresh, it shows the "project not found" screen.</p>