Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422023-12-21T16:55:13ZArvados
Redmine Arvados - Bug #21314 (New): a-d-c should cancel a container if it can't be loadedhttps://dev.arvados.org/issues/213142023-12-21T16:55:13ZTom Cleggtom@curii.com
<p>If a container's "mounts" field is invalid, a-d-c logs this, and keeps trying.</p>
<code class="json syntaxhl"><span class="p">{</span><span class="nl">"ClusterID"</span><span class="p">:</span><span class="s2">"irdev"</span><span class="p">,</span><span class="nl">"ContainerUUID"</span><span class="p">:</span><span class="s2">"<a href="https://arvadosapi.com/xxxxx-dz642-xxxxxxxxxxxxxxx">xxxxx-dz642-xxxxxxxxxxxxxxx</a>"</span><span class="p">,</span><span class="nl">"PID"</span><span class="p">:</span><span class="mi">2037423</span><span class="p">,</span><span class="nl">"error"</span><span class="p">:</span><span class="s2">"json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount"</span><span class="p">,</span><span class="nl">"level"</span><span class="p">:</span><span class="s2">"warning"</span><span class="p">,</span><span class="nl">"msg"</span><span class="p">:</span><span class="s2">"error getting mounts"</span><span class="p">,</span><span class="nl">"time"</span><span class="p">:</span><span class="s2">"2023-12-13T20:34:41.064140517Z"</span><span class="p">}</span><span class="w">
</span></code>
<p>In this situation, the offending container should be cancelled.</p> Arvados - Bug #21187 (New): a-c-r should detect and warn when arv:IntermediateOutput outputTTL is...https://dev.arvados.org/issues/211872023-11-09T19:31:33ZTom Cleggtom@curii.com
<p>Currently, if outputTTL is set too low and a workflow tries to use intermediate data after it has already been trashed, a-c-r may read a intermediate collection manifest successfully (before trash time) but then fail to save it later (after trash time) in a combined collection. In that case the user ends up getting a python stack trace ending in a 403 error (invalid blob signature).</p>
<p>a-c-r should warn the user when the duration the current workflow has been running exceeds outputTTL (this is probably a good indicator the user should increase outputTTL even if it hasn't actually broken anything yet)</p>
<p>a-c-r should also report a more helpful error message when it fails to create a collection due to expired blob signatures. This could be done by checking for a 403 error from the create call and/or checking the expiry times (given as hexadecimal unix times) on the blob signatures in the manifest text.</p> Arvados - Feature #19889 (Resolved): access current container logs at /arvados/v1/containers/{uui...https://dev.arvados.org/issues/198892022-12-28T20:36:52ZTom Cleggtom@curii.com
<p>See <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Efficient_live_access_to_container_logs">Efficient live access to container logs</a></p>
<p>Provide access to the logs for a locked/running container, including the portion that has not yet been flushed to keep.</p>
Included:
<ul>
<li>read-only webdav handler in crunch-run (may involve refactoring/exporting some parts of keep-web to avoid copying code)</li>
<li>webdav handler in controller that proxies to crunch-run gateway, keep-web (webdav.InternalURLs), or an "empty collection" stub, depending on whether container is active/finished</li>
</ul>
Not included:
<ul>
<li>log_events API</li>
<li>workbench2</li>
</ul> Arvados - Feature #18790 (Resolved): Access live container logs through arvados-client and crunch...https://dev.arvados.org/issues/187902022-02-18T15:28:36ZTom Cleggtom@curii.com
<p>A command like this should show live logs on stdout, and exit when the container finishes:</p>
<pre><code>$ arvados-client logs $container_request_uuid</code></pre>
<p>To be implemented using <a class="wiki-page" href="https://dev.arvados.org/projects/arvados/wiki/Efficient_live_access_to_container_logs">Efficient live access to container logs</a></p>
For now:
<ul>
<li>Show content from crunch-run.txt, stderr.txt, stdout.txt</li>
<li>No headers/prefixes showing which file the logs are coming from</li>
<li>Uses the endpoint on controller <code>/arvados/v1/container_requests/{uuid}/log/{container_uuid}/{file}</code> to poll the files.</li>
</ul>
<p>(description updated after the fact to reflect API change, see <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: Access live container logs through arvados-client and crunch-run container gateway (Resolved)" href="https://dev.arvados.org/issues/18790#note-23">#18790#note-23</a>)</p> Arvados - Bug #18334 (New): Accept release info changes in docker recipeshttps://dev.arvados.org/issues/183342021-11-04T15:02:18ZTom Cleggtom@curii.com
<p>In some circumstances, "apt-get update" stops working due to existence of a future debian version.</p>
<p>This can break cmd/arvados-package tests.</p>
<pre>
$ docker run --rm -it arvados-package-deps-debian:10 bash
root@7d1560822db7:/# apt-get update
Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Reading package lists... Done
E: Repository 'http://security.debian.org/debian-security buster/updates InRelease' changed its 'Suite' value from 'stable' to 'oldstable'
N: This must be accepted explicitly before updates for this repository can be applied. See apt-secure(8) manpage for details.
N: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Version' value from '10.9' to '10.11'
E: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Suite' value from 'stable' to 'oldstable'
N: This must be accepted explicitly before updates for this repository can be applied. See apt-secure(8) manpage for details.
E: Repository 'http://deb.debian.org/debian buster-updates InRelease' changed its 'Suite' value from 'stable-updates' to 'oldstable-updates'
N: This must be accepted explicitly before updates for this repository can be applied. See apt-secure(8) manpage for details.
</pre>
<p>Proposed fix: "apt-get --allow-releaseinfo-change update" in scripts.</p> Arvados - Feature #18113 (Resolved): [a-d-c] non-zero defaults for MaxCloudOpsPerSecond and MaxCo...https://dev.arvados.org/issues/181132021-09-07T18:03:33ZTom Cleggtom@curii.com
<p>Instead of 0 (unlimited) we could start with values that are reasonable for a normal size production cluster / typical cloud account.</p>
<p>MaxConcurrentInstanceCreateOps was invented to accommodate Azure limitations, but now that we have it, we might as well promote using it as a guard rail for all cloud providers.</p>
<p>Perhaps 10 ops per second and 1 concurrent create op. Recommend raising to 20 concurrent create ops for Azure since it doesn't return the "create" response until the instance has booted.</p> Arvados - Bug #17529 (Resolved): [a-d-c] AWS/EC2 driver should return a RateLimitError to dispatc...https://dev.arvados.org/issues/175292021-04-13T15:24:30ZTom Cleggtom@curii.com
<p>Current code results in error logs like this:</p>
<code class="json syntaxhl"><span class="p">{</span><span class="nl">"InstanceType"</span><span class="p">:</span><span class="s2">"c5large.spot"</span><span class="p">,</span><span class="nl">"PID"</span><span class="p">:</span><span class="mi">2231</span><span class="p">,</span><span class="nl">"error"</span><span class="p">:</span><span class="s2">"RequestLimitExceeded: Request limit exceeded.</span><span class="se">\n\t</span><span class="s2">status code: 503, request id: 778d5b22-90e4-4a48-8f09-6946e8edcb2c"</span><span class="p">,</span><span class="nl">"level"</span><span class="p">:</span><span class="s2">"error"</span><span class="p">,</span><span class="nl">"msg"</span><span class="p">:</span><span class="s2">"create failed"</span><span class="p">,</span><span class="nl">"time"</span><span class="p">:</span><span class="s2">"2021-03-30T13:38:38.925565910Z"</span><span class="p">}</span><span class="w">
</span></code>
<p>...but the returned error does not implement the lib/cloud.RateLimitError interface, so the dispatcher doesn't back off.</p> Arvados - Bug #16795 (Resolved): [a-d-c] flaky testhttps://dev.arvados.org/issues/167952020-09-01T18:50:42ZTom Cleggtom@curii.com
<p><a class="external" href="https://ci.arvados.org/job/run-tests-remainder/3934/consoleText"<a href="https://ci.arvados.org/job/run-tests-remainder/3934/">run-tests-remainder: #3934 <img src="https://ci.arvados.org/buildStatus/icon?job=run-tests-remainder&build=3934" alt="" /></a>/consoleText</a></p>
<pre>
dispatcher_test.go:212:
c.Check(resp.Body.String(), check.Matches, `(?ms).*boot_outcomes{outcome="aborted"} 0.*`)
...
... "arvados_dispatchcloud_boot_outcomes{outcome=\"aborted\"} 15\n" +
</pre> Arvados - Task #11689 (Resolved): Accept index request for a specific mounthttps://dev.arvados.org/issues/116892017-05-12T14:56:21ZTom Cleggtom@curii.comArvados - Task #9235 (Resolved): Accept keep services list in confighttps://dev.arvados.org/issues/92352016-05-18T19:23:19ZTom Cleggtom@curii.comArvados - Task #7580 (Closed): Accept CORS requestshttps://dev.arvados.org/issues/75802015-10-15T20:36:37ZTom Cleggtom@curii.comArvados - Task #5746 (Resolved): Accept readonly flag in -volumes=... argumenthttps://dev.arvados.org/issues/57462015-04-15T21:23:45ZTom Cleggtom@curii.comArvados - Task #4059 (Resolved): Accept arbitrary git url as repository if --git-dir not givenhttps://dev.arvados.org/issues/40592014-10-01T20:25:39ZTom Cleggtom@curii.com
<p>(If --git-dir given, do not do any git stuff anywhere else.)</p> Arvados - Bug #3346 (Closed): ActiveRecord::SaveFailed exception does not include useful error me...https://dev.arvados.org/issues/33462014-07-24T17:43:01ZTom Cleggtom@curii.comArvados - Task #2787 (Resolved): Accept manifests with signature tokens during collections.createhttps://dev.arvados.org/issues/27872014-05-07T16:15:08ZTom Cleggtom@curii.com