Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422024-03-26T14:10:39ZArvados
Redmine Arvados - Task #21619 (In Progress): Review 21617-fed-contenthttps://dev.arvados.org/issues/216192024-03-26T14:10:39ZTom Cleggtom@curii.comArvados - Bug #21618 (New): cloudtest should give up if test instance disappears from listing bef...https://dev.arvados.org/issues/216182024-03-25T16:52:07ZTom Cleggtom@curii.com
<p>Currently, if an instance/image has a problem that causes it to shutdown before responding to a boot probe, cloudtest keeps probing after it disappears, which is clearly futile.</p> Arvados - Bug #21617 (In Progress): Timeout error reading content from collection on a remote clu...https://dev.arvados.org/issues/216172024-03-25T14:43:50ZTom Cleggtom@curii.com
In a 3-way federation with login cluster z1111:
<ul>
<li>a collection stored on z1111 can be read from z2222 (e.g., workbench.z2222/collections/z1111-4zz18-...)</li>
<li>a collection stored on z2222 cannot be read from z1111 (timeout)</li>
<li>a collection stored on z2222 cannot be read from z3333 (timeout)</li>
</ul>
<p>It looks like the intermediate cluster's keepstore process cannot retrieve the list of keep services from the cluster where the data is stored ("failed to validate remote token") -- this auto-retries in the background for a while, then eventually blockReadRemote gives up.</p>
<p>Manual testing, with jutro/tordo/pirca playing the roles of z1111/z2222/z3333, indicates the same problem existed before and after <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> was merged and deployed to tordo.</p> Arvados - Feature #21606 (In Progress): configurable keep-web output buffer to reduce delay betwe...https://dev.arvados.org/issues/216062024-03-19T03:59:41ZTom Cleggtom@curii.com
<p>According to <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Go FileSystem / FUSE mount supports block prefetch (Closed)" href="https://dev.arvados.org/issues/18961">#18961</a>, now that <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> has reduced the TTFB for fetching a block, predicting and pre-fetching the next block appears to be more complex than it's worth.</p>
<p>Instead, in a typical scenario where the backend (keepstore→keep-web) bandwidth is faster than the frontend (keep-web→client), keep-web can reduce or eliminate the between-block delay by writing to an asynchronous output buffer. While keep-web is waiting a few milliseconds for the next block to start arriving from the backend, the client continues to receive the data that has accumulated in the output buffer.</p>
<p>The size of the output buffer should be configurable.</p> Arvados - Feature #21599 (New): _inspect/requests endpoint should reveal whether each request is ...https://dev.arvados.org/issues/215992024-03-15T18:45:20ZTom Cleggtom@curii.com
<p>This is a little inconvenient because the queue decision happens lower in the handler stack than the inspector (and we don't want to change that).</p>
<p>We can do something similar to responseLogFieldsContextKey in <a class="source" href="https://dev.arvados.org/projects/arvados/repository/arvados/entry/sdk/go/httpserver/logger.go">source:sdk/go/httpserver/logger.go</a> -- attach an atomic.Value to the request context as it passes through the Inspect handler, then have RequestLimiter Store() queue status there (queue label, time the request was released for processing), and Load() when generating the _inspect/requests report.</p> Arvados - Bug #21598 (New): Local keepstore invoked by crunch-run should never do EmptyTrash workhttps://dev.arvados.org/issues/215982024-03-15T18:32:48ZTom Cleggtom@curii.com
<p>We don't want N compute nodes periodically checking expiry times on all of the trashed blocks on all backend volumes.</p> Arvados - Feature #21578 (Resolved): Add debug logging option to arvados-client mounthttps://dev.arvados.org/issues/215782024-03-11T15:38:01ZTom Cleggtom@curii.com
<p>When invoked as</p>
<pre><code>arvados-client mount --log-level=debug ...</code></pre>
<p>when an error code is returned to a fuse API call ("I/O error") the original (typically much more informative) error message should also be logged to the terminal.</p> Arvados - Bug #21417 (Resolved): Stop trying to read image timestamp from docker metadata in arv-...https://dev.arvados.org/issues/214172024-01-25T16:44:16ZTom Cleggtom@curii.com
<p>This part of <a class="source" href="https://dev.arvados.org/projects/arvados/repository/arvados/entry/sdk/python/arvados/commands/keepdocker.py">source:sdk/python/arvados/commands/keepdocker.py</a> should go away so it doesn't crash on new image tarball formats:</p>
<pre>
json_file = image_tar.extractfile(image_tar.getmember(json_filename))
image_metadata = json.loads(json_file.read().decode('utf-8'))
json_file.close()
image_tar.close()
link_base = {'head_uuid': coll_uuid, 'properties': {}}
if 'created' in image_metadata:
link_base['properties']['image_timestamp'] = image_metadata['created']
</pre>
<p>See <a class="issue tracker-6 status-3 priority-4 priority-default closed" title="Idea: test-provision-debian11 fails loading workflow Docker image (Resolved)" href="https://dev.arvados.org/issues/21408">#21408</a> for example.</p>
<p>(Tom & Peter discussed offline, came to the conclusion that saving the image timestamp is not important enough to justify maintaining the code.)</p> Arvados - Task #21380 (Resolved): Review 21379-user-activity-remote-collectionhttps://dev.arvados.org/issues/213802024-01-12T19:48:23ZTom Cleggtom@curii.comArvados - Bug #21379 (Resolved): arv-user-activity crashes on file_download event for remote coll...https://dev.arvados.org/issues/213792024-01-12T19:37:43ZTom Cleggtom@curii.com
<pre>
User activity on pirca between 2024-01-11 05:00 and 2024-01-12 05:00
Traceback (most recent call last):
File "/usr/bin/arv-user-activity", line 8, in <module>
sys.exit(main())
File "/usr/share/python3/dist/python3-arvados-user-activity/lib/python3.7/site-packages/arvados_user_activity/main.py", line 214, in main
getCollectionName(arv, e["properties"].get("collection_uuid"), e["properties"].get("portable_data_hash")),
File "/usr/share/python3/dist/python3-arvados-user-activity/lib/python3.7/site-packages/arvados_user_activity/main.py", line 111, in getCollectionName
u = arv.collections().list(filters=filters, order="created_at", limit=1).execute().get("items")
File "/usr/share/python3/dist/python3-arvados-user-activity/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/share/python3/dist/python3-arvados-user-activity/lib/python3.7/site-packages/googleapiclient/http.py", line 938, in execute
raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 400 when requesting https://pirca.arvadosapi.com/arvados/v1/collections?filters=%5B%5B%22uuid%22%2C+%22%3D%22%2C+%22<a href="https://arvadosapi.com/tordo-4zz18-kaaj8hjcnqb8i0p">tordo-4zz18-kaaj8hjcnqb8i0p</a>%22%5D%5D&order=created_at&limit=1&alt=json returned "cannot execute federated list query unless count=="none"">
</pre>
<pre>
User activity on tordo between 2024-01-11 19:33 and 2024-01-12 19:33
Traceback (most recent call last):
File "/tmp/venv/bin/arv-user-activity", line 8, in <module>
sys.exit(main())
File "/tmp/venv/lib/python3.9/site-packages/arvados_user_activity/main.py", line 214, in main
getCollectionName(arv, e["properties"].get("collection_uuid"), e["properties"].get("portable_data_hash")),
File "/tmp/venv/lib/python3.9/site-packages/arvados_user_activity/main.py", line 111, in getCollectionName
u = arv.collections().list(filters=filters, order="created_at", limit=1, count="none").execute().get("items")
File "/tmp/venv/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/tmp/venv/lib/python3.9/site-packages/googleapiclient/http.py", line 938, in execute
raise HttpError(resp, content, uri=self.uri)
arvados.errors.ApiError: <HttpError 400 when requesting https://tordo.arvadosapi.com/arvados/v1/collections?filters=%5B%5B%22uuid%22%2C+%22%3D%22%2C+%22<a href="https://arvadosapi.com/pirca-4zz18-tsiyvmfkr2gub8w">pirca-4zz18-tsiyvmfkr2gub8w</a>%22%5D%5D&order=created_at&limit=1&count=none&alt=json returned "cannot execute federated list query with limit (1) < nUUIDs (1), offset (0) > 0, or order ([created_at]) parameter">
</pre> Arvados - Task #21352 (Resolved): Review 21258-flaky-adc-testhttps://dev.arvados.org/issues/213522024-01-05T15:53:55ZTom Cleggtom@curii.comArvados - Task #21325 (Resolved): Review 21285-max-gw-tunnelshttps://dev.arvados.org/issues/213252024-01-01T21:58:46ZTom Cleggtom@curii.comArvados - Task #21324 (Resolved): Review 21276-test-racehttps://dev.arvados.org/issues/213242023-12-29T21:35:44ZTom Cleggtom@curii.comArvados - Idea #21323 (New): System services use cache/config directories indicated by XDG env va...https://dev.arvados.org/issues/213232023-12-29T16:49:54ZTom Cleggtom@curii.com
<p>From <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: Go SDK supports local filesystem-backed data cache (Resolved)" href="https://dev.arvados.org/issues/20318#note-19">#20318#note-19</a></p>
<ul>
<li>If the systemd $*_DIRECTORY variable is set, use that.</li>
<li>Otherwise, if the XDG $XDG_*_HOME/$XDG_*_DIR variable is set, use that. (See <a class="issue tracker-6 status-1 priority-4 priority-default" title="Idea: Support XDG base directory envvars throughout the Python SDK (New)" href="https://dev.arvados.org/issues/21020">#21020</a>)</li>
<li>Otherwise, default to current behavior.</li>
<li>Update our systemd unit files to use the *Directory directives.</li>
</ul>
<p><a class="external" href="https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#RuntimeDirectory=">https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#RuntimeDirectory=</a></p> Arvados - Bug #21319 (New): Avoid waiting/deadlock when a controller handler performs subrequests...https://dev.arvados.org/issues/213192023-12-27T23:26:44ZTom Cleggtom@curii.com