Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422024-03-26T14:10:39ZArvados
Redmine Arvados - Task #21619 (In Progress): Review 21617-fed-contenthttps://dev.arvados.org/issues/216192024-03-26T14:10:39ZTom Cleggtom@curii.comArvados - Bug #21618 (New): cloudtest should give up if test instance disappears from listing bef...https://dev.arvados.org/issues/216182024-03-25T16:52:07ZTom Cleggtom@curii.com
<p>Currently, if an instance/image has a problem that causes it to shutdown before responding to a boot probe, cloudtest keeps probing after it disappears, which is clearly futile.</p> Arvados - Bug #21617 (In Progress): Timeout error reading content from collection on a remote clu...https://dev.arvados.org/issues/216172024-03-25T14:43:50ZTom Cleggtom@curii.com
In a 3-way federation with login cluster z1111:
<ul>
<li>a collection stored on z1111 can be read from z2222 (e.g., workbench.z2222/collections/z1111-4zz18-...)</li>
<li>a collection stored on z2222 cannot be read from z1111 (timeout)</li>
<li>a collection stored on z2222 cannot be read from z3333 (timeout)</li>
</ul>
<p>It looks like the intermediate cluster's keepstore process cannot retrieve the list of keep services from the cluster where the data is stored ("failed to validate remote token") -- this auto-retries in the background for a while, then eventually blockReadRemote gives up.</p>
<p>Manual testing, with jutro/tordo/pirca playing the roles of z1111/z2222/z3333, indicates the same problem existed before and after <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> was merged and deployed to tordo.</p> Arvados - Feature #21606 (In Progress): configurable keep-web output buffer to reduce delay betwe...https://dev.arvados.org/issues/216062024-03-19T03:59:41ZTom Cleggtom@curii.com
<p>According to <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Go FileSystem / FUSE mount supports block prefetch (Closed)" href="https://dev.arvados.org/issues/18961">#18961</a>, now that <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> has reduced the TTFB for fetching a block, predicting and pre-fetching the next block appears to be more complex than it's worth.</p>
<p>Instead, in a typical scenario where the backend (keepstore→keep-web) bandwidth is faster than the frontend (keep-web→client), keep-web can reduce or eliminate the between-block delay by writing to an asynchronous output buffer. While keep-web is waiting a few milliseconds for the next block to start arriving from the backend, the client continues to receive the data that has accumulated in the output buffer.</p>
<p>The size of the output buffer should be configurable.</p> Arvados - Feature #21599 (New): _inspect/requests endpoint should reveal whether each request is ...https://dev.arvados.org/issues/215992024-03-15T18:45:20ZTom Cleggtom@curii.com
<p>This is a little inconvenient because the queue decision happens lower in the handler stack than the inspector (and we don't want to change that).</p>
<p>We can do something similar to responseLogFieldsContextKey in <a class="source" href="https://dev.arvados.org/projects/arvados/repository/arvados/entry/sdk/go/httpserver/logger.go">source:sdk/go/httpserver/logger.go</a> -- attach an atomic.Value to the request context as it passes through the Inspect handler, then have RequestLimiter Store() queue status there (queue label, time the request was released for processing), and Load() when generating the _inspect/requests report.</p> Arvados - Bug #21598 (New): Local keepstore invoked by crunch-run should never do EmptyTrash workhttps://dev.arvados.org/issues/215982024-03-15T18:32:48ZTom Cleggtom@curii.com
<p>We don't want N compute nodes periodically checking expiry times on all of the trashed blocks on all backend volumes.</p> Arvados - Feature #21578 (Resolved): Add debug logging option to arvados-client mounthttps://dev.arvados.org/issues/215782024-03-11T15:38:01ZTom Cleggtom@curii.com
<p>When invoked as</p>
<pre><code>arvados-client mount --log-level=debug ...</code></pre>
<p>when an error code is returned to a fuse API call ("I/O error") the original (typically much more informative) error message should also be logged to the terminal.</p> Arvados - Task #20437 (New): Reviewhttps://dev.arvados.org/issues/204372023-04-26T16:06:51ZTom Cleggtom@curii.comArvados - Task #20436 (New): Reviewhttps://dev.arvados.org/issues/204362023-04-26T16:06:27ZTom Cleggtom@curii.comArvados - Feature #19860 (In Progress): Support "pull image" container requesthttps://dev.arvados.org/issues/198602022-12-07T19:55:14ZTom Cleggtom@curii.com
<p>See <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Build_docker_images_as_part_of_a_workflow">Build docker images as part of a workflow</a></p>
<p>A container request like this</p>
<pre><code class="yaml syntaxhl"><span class="na">container_request</span><span class="pi">:</span>
<span class="na">container_image</span><span class="pi">:</span> <span class="s2">"</span><span class="s">arvados/builtin"</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">docker"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">pull"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">examplerepo:tag"</span><span class="pi">]</span>
<span class="na">mounts</span><span class="pi">:</span> <span class="pi">{}</span>
<span class="na">runtime_constraints</span><span class="pi">:</span>
<span class="na">API</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">RAM</span><span class="pi">:</span> <span class="m">1000000000</span>
<span class="na">output_path</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/"</span>
</code></pre>
<p>should pull the indicated image from docker hub and save it to a collection, appropriately tagged, so it can be used in a container request like this</p>
<pre><code class="yaml syntaxhl"><span class="na">container_request</span><span class="pi">:</span>
<span class="na">container_image</span><span class="pi">:</span> <span class="s2">"</span><span class="s">examplerepo:tag"</span>
<span class="c1"># ...</span>
</code></pre> Arvados - Task #16098 (Resolved): Review 12308-cgofusehttps://dev.arvados.org/issues/160982020-01-29T17:22:23ZTom Cleggtom@curii.comArvados - Idea #12308 (Resolved): [FUSE] Golang-based fuse driverhttps://dev.arvados.org/issues/123082017-09-22T18:15:19ZTom Cleggtom@curii.com
<p>Background:</p>
<p>Python+llfuse was expedient and has done lots of good work for us, but it's not promising as a long term (fast+reliable+maintainable) solution.</p>
Implementation:
<ul>
<li>collection-backed filesystem from <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: [keep-web] writable webdav (Resolved)" href="https://dev.arvados.org/issues/12483">#12483</a>, plus more general arvados-backed filesystem ("by_id" directory, etc, same as the one exported via webdav) from <a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="Idea: [WebDAV] Support browsing of project hierarchies (Resolved)" href="https://dev.arvados.org/issues/13111">#13111</a></li>
<li>present as fuse using a library like <a class="external" href="https://godoc.org/bazil.org/fuse">https://godoc.org/bazil.org/fuse</a> or <a class="external" href="https://godoc.org/github.com/billziss-gh/cgofuse/fuse">https://godoc.org/github.com/billziss-gh/cgofuse/fuse</a></li>
<li>package as a subcommand ("mount") of the <a class="source" href="https://dev.arvados.org/projects/arvados/repository/arvados/entry/cmd/arvados-client">source:cmd/arvados-client</a> program</li>
</ul>
TBD:
<ul>
<li>Approach for handling websocket "update" events</li>
<li>Selectable mechanisms/options for syncing to server (fflush, fsync, close) (on a shell node, flush-on-close, flush-periodically, or flush-after-idle-time might be best; in crunch-run, flush-on-exit might be best)</li>
<li>Desired behavior when updates conflict (write error? clobber? create "oops,clobbered" file?)</li>
</ul>
Other current bugs/limitations:
<ul>
<li>Old keep block signatures don't get refreshed, so reading a collection that's been cached for too long returns an I/O error</li>
<li>Not command-line compatible with arv-mount</li>
<li>Logging is not great</li>
<li>No docs</li>
<li>No way to control overall cache size (currently collectionfs can use lots of RAM in certain non-sequential write scenarios; we need the ability to trade speed for space efficiency in memory-constrained environments)</li>
<li>No warnings given when cache is thrashing</li>
<li>No application level instrumentation (just optional Go pprof)</li>
<li>Special <code>.arvados#collection</code> file is incomplete (has manifest_text but not uuid, pdh)</li>
<li>No automatic flush on sigint/sigterm</li>
<li>No warning given when trying to exit but filesystem can't be unmounted yet (filehandle is open, or a process's cwd is in the mount)</li>
<li>Mac port has a race bug (see notes below)</li>
<li>Windows port is untested</li>
<li>Cross-compiling recipe for Mac/Windows ports is fragile</li>
<li>chmod is a no-op (chmod 0700 succeeds, but the file mode will still be 0755)</li>
</ul> Arvados - Bug #3989 (Closed): [Workbench] [DRAFT] Fix clock/node time reporting on Workbench pipe...https://dev.arvados.org/issues/39892014-09-25T13:59:02ZTom Cleggtom@curii.com
Including:
<ul>
<li>No figures should include time spent queued</li>
<li>When jobs are reused in a pipeline, time of reused jobs should be reported separately, and not included in any of the pipeline's other time stats</li>
<li>Pipeline/job timing figures should not mislabel "sum of wallclock time of each task" as "CPU time" </li>
<li>Fix reporting for failed jobs (see <a class="issue tracker-5 status-5 priority-4 priority-default closed child" title="Task: Fix reported scaling factor for failed jobs (see notes) (Closed)" href="https://dev.arvados.org/issues/5651">#5651</a> notes)</li>
</ul> Arvados - Bug #3691 (Closed): [Workbench] Suppress "ajax failed" error messages when ajax fails d...https://dev.arvados.org/issues/36912014-08-25T18:40:47ZTom Cleggtom@curii.com
<p>In fail callbacks, check <code>jqxhr.readyState</code> and <code>status == 'abort'</code> to determine whether the request was terminated by a user action (like navigating away from the current page) rather than an application/network error.</p>
In this case, correct behavior is something like
<ul>
<li>Show a distinct message like "(cancelled)" that doesn't look like an error, or</li>
<li>Don't display anything, just clean up internal state and stop.</li>
</ul> Arvados - Task #2822 (Closed): [SDKs] Use Ruby SDK instead of google-api-client in arv-run-pipeli...https://dev.arvados.org/issues/28222014-05-13T13:03:56ZTom Cleggtom@curii.com
<p><strong>Acceptance criteria</strong></p>
<p>As the title says: arv-run-pipeline-instance does not instantiate a Google API client object directly. Any Arvados API objects it instantiates originate from the Ruby SDK.</p>