Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422024-03-25T16:52:07ZArvados
Redmine Arvados - Bug #21618 (New): cloudtest should give up if test instance disappears from listing bef...https://dev.arvados.org/issues/216182024-03-25T16:52:07ZTom Cleggtom@curii.com
<p>Currently, if an instance/image has a problem that causes it to shutdown before responding to a boot probe, cloudtest keeps probing after it disappears, which is clearly futile.</p> Arvados - Bug #21617 (In Progress): Timeout error reading content from collection on a remote clu...https://dev.arvados.org/issues/216172024-03-25T14:43:50ZTom Cleggtom@curii.com
In a 3-way federation with login cluster z1111:
<ul>
<li>a collection stored on z1111 can be read from z2222 (e.g., workbench.z2222/collections/z1111-4zz18-...)</li>
<li>a collection stored on z2222 cannot be read from z1111 (timeout)</li>
<li>a collection stored on z2222 cannot be read from z3333 (timeout)</li>
</ul>
<p>It looks like the intermediate cluster's keepstore process cannot retrieve the list of keep services from the cluster where the data is stored ("failed to validate remote token") -- this auto-retries in the background for a while, then eventually blockReadRemote gives up.</p>
<p>Manual testing, with jutro/tordo/pirca playing the roles of z1111/z2222/z3333, indicates the same problem existed before and after <a class="issue tracker-2 status-2 priority-4 priority-default parent" title="Feature: Keepstore can stream GET and PUT requests using keep-gateway API (In Progress)" href="https://dev.arvados.org/issues/2960">#2960</a> was merged and deployed to tordo.</p> Arvados - Bug #21598 (In Progress): Local keepstore invoked by crunch-run should never do EmptyTr...https://dev.arvados.org/issues/215982024-03-15T18:32:48ZTom Cleggtom@curii.com
<p>We don't want N compute nodes periodically checking expiry times on all of the trashed blocks on all backend volumes.</p> Arvados - Bug #21314 (New): a-d-c should cancel a container if it can't be loadedhttps://dev.arvados.org/issues/213142023-12-21T16:55:13ZTom Cleggtom@curii.com
<p>If a container's "mounts" field is invalid, a-d-c logs this, and keeps trying.</p>
<code class="json syntaxhl"><span class="p">{</span><span class="nl">"ClusterID"</span><span class="p">:</span><span class="s2">"irdev"</span><span class="p">,</span><span class="nl">"ContainerUUID"</span><span class="p">:</span><span class="s2">"<a href="https://arvadosapi.com/xxxxx-dz642-xxxxxxxxxxxxxxx">xxxxx-dz642-xxxxxxxxxxxxxxx</a>"</span><span class="p">,</span><span class="nl">"PID"</span><span class="p">:</span><span class="mi">2037423</span><span class="p">,</span><span class="nl">"error"</span><span class="p">:</span><span class="s2">"json: cannot unmarshal array into Go struct field Container.mounts of type arvados.Mount"</span><span class="p">,</span><span class="nl">"level"</span><span class="p">:</span><span class="s2">"warning"</span><span class="p">,</span><span class="nl">"msg"</span><span class="p">:</span><span class="s2">"error getting mounts"</span><span class="p">,</span><span class="nl">"time"</span><span class="p">:</span><span class="s2">"2023-12-13T20:34:41.064140517Z"</span><span class="p">}</span><span class="w">
</span></code>
<p>In this situation, the offending container should be cancelled.</p> Arvados - Bug #19081 (In Progress): Possible bug passing cmd line arguments with spaces to singul...https://dev.arvados.org/issues/190812022-04-28T20:06:56ZPeter Amstutzpeter.amstutz@curii.com
<p>Customer reported a job that worked correctly with Docker runtime, did not work with the Singularity runtime.</p>
<p>The command line looked like this:</p>
<p><code>["/bin/bash", "-c", "command1 --option1 --option2"]</code></p>
<p>However it acts is if it were invoked as</p>
<p><code>/bin/bash -c command1</code></p>
<p>or possibly</p>
<p><code>/bin/bash -c command1 --option1 --option2</code></p>
<p>Further bolstering this hypothesis, the workaround was to not run it as a shell command (which seems to have been unnecessary, anyway), this worked as expected:</p>
<p><code>["command1", "--option1", "--option2"]</code></p> Arvados - Bug #16888 (In Progress): Federate container token cannot access resources on other clu...https://dev.arvados.org/issues/168882020-09-25T17:50:58ZPeter Amstutzpeter.amstutz@curii.com
<p><a class="external" href="https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-ios1sk1hbcj8knc">https://workbench.tordo.arvadosapi.com/container_requests/tordo-xvhdp-ios1sk1hbcj8knc</a></p>
<p>This fails, despite the fact that when accessing the collection by other means (both "arv collection get" and arv-mount) the user is able to go through tordo and fetch the collection from ce8i5 (i.e. federation works as intended).</p>
<p>I think what is happening here is that the container gets issued a new temporary token, that token belongs to the federate cluster not the LoginCluster, and so it can only be used to access resources on the federate but not other clusters in the federation.</p>
<p>So that's a bug / missing feature that in this situation.</p>
<p>When the user's token belongs to a LoginCluster, controller needs to request a new token from the LoginCluster instead creating a local one. This should be set as the "runtime token" on the container request, along with a new(?) flag to indicate if the runtime token should be expired when the container request is finished.</p> Arvados - Bug #11679 (New): [Workbench] Logs containers with undefined exit codehttps://dev.arvados.org/issues/116792017-05-11T14:23:43ZPeter Amstutzpeter.amstutz@curii.com
<p>Workbench logs container state change messages. When a container is complete, it looks at the exit code and reports success or failure. However, for some reason, the exit code is reported as undefined, which is not zero, and it reports failure (I've seen this with both exit code 1 and exit code 0):</p>
<pre>
2017-05-11T14:12:36.420834199Z Waiting for container to finish
2017-05-11T14:18:49.777935199Z Container exited with code: 1
2017-05-11T14:18:49.993315999Z Complete
2017-05-11T14:18:50.791787Z Container <a href="https://arvadosapi.com/qr1hi-dz642-hre34ora0zbr9k1">qr1hi-dz642-hre34ora0zbr9k1</a> finished with exit code undefined (failure)
</pre> Arvados - Bug #11460 (In Progress): [SDK] avoid interfering with socket open/close - use pycurl s...https://dev.arvados.org/issues/114602017-04-12T14:38:20ZTom Cleggtom@curii.comTapestry - Bug #6924 (In Progress): Google survey participation link should work for newly create...https://dev.arvados.org/issues/69242015-08-06T17:21:58ZTom Cleggtom@curii.com
<p>See <a class="changeset" title="Apparently Google Forms has been changing the url pattern to prefill form fields. This appears to..." href="https://dev.arvados.org/projects/tapestry/repository/tapestry/revisions/e70c808ab4518cf68ae56f702ae64f5232bbc0ba">e70c808ab4518cf68ae56f702ae64f5232bbc0ba</a></p>
<p>My theory:</p>
<pre>
# Google changed the field IDs on existing forms/results. Old
# field IDs were small multiples of 10, and were changed from N to
# 1000000+N; new ones are big numbers and should be used verbatim.
</pre>
<p>Current code <em>always</em> adds 1000000. According to Nancy's experiments, this means in order to conduct a survey with a newly created Google form, you have to enter "2014123456" when the real form field ID is "2015123456".</p> Arvados - Bug #5560 (New): [DRAFT] [API] Good API for accessing the old_ and new_attributes in Lo...https://dev.arvados.org/issues/55602015-03-25T05:31:43ZTom Cleggtom@curii.com
Problems with the <code>old_attributes</code> and <code>new_attributes</code> hashes in the properties hash:
<ul>
<li>They don't look quite close enough to "what the API response to GET would have looked like at the time" for clients to reuse code to interpret/display them (e.g., timestamp formats can be different, computed properties are not present, locators in manifests are not signed).</li>
<li>The <code>*_attributes</code> hashes can be huge (notably for collection updates, where there are two copies of the manifest). This uses a lot of database space, and (worse) makes it very slow for clients to retrieve logs (unless they use <code>select</code> to avoid retrieving <em>any</em> properties).</li>
<li>Even the associated <code>old_etag</code> and <code>new_etag</code> fields are not indexable or searchable.</li>
</ul>
<p>Possible approach:</p>
<p>Add a version table (either a single one, or a table per model type), indexed by etag. Store the attributes in the version table. In the logs table, just store the old and new etags. Provide a distinct API for retrieving a specific version of an object (in the usual API response format for that object type) by giving its etag. Optionally, provide an API for retrieving the logs <em>and</em> the object versions referenced by the logs in one request (this would be as complete as the current behavior -- but that isn't necessarily important).</p> Arvados - Bug #5523 (New): [Crunch] crunchstat should not report errors during normal timing raceshttps://dev.arvados.org/issues/55232015-03-20T18:05:58ZPeter Amstutzpeter.amstutz@curii.com
<p>Container stat files appear and disappear in normal operation. In the "normal" cases, such events should not be logged (let alone as an error).</p>
We expect zero or one episode of "cannot find stats file" when cidfile != "" and we're collecting stats for the first time.
<ul>
<li>If the first collection attempt for a given statistic results in "cannot find file", we should block in OpenStatFile and poll quickly over a short interval (say, every 100ms, max 1s) because we probably just won the race with the container setup process.</li>
<li>If the stat files don't show up within that max interval (~1s) it means something is wrong, and this should (still) be logged.</li>
</ul>
We expect zero or one episode of "stats file disappeared" when cidfile != "" when we happen to poll between container shutdown and (crunchstat's) child exit. For a given statistic:
<ul>
<li>The first time this occurs, we should not log anything.</li>
<li>The second time this occurs, we should log "warning: stats file disappeared {duration} ago, but child has not exited".</li>
<li>The third+ time this occurs, we should not log anything.</li>
<li>If the stat file reappears, we should reset the "went missing" counter to zero.</li>
</ul> Tapestry - Bug #1471 (New): Researcher > Kits table content loads very slowlyhttps://dev.arvados.org/issues/14712013-05-07T12:07:39ZTom Cleggtom@curii.comGET-Evidence - Bug #1104 (New): Division by zero warning when computing progress percentagehttps://dev.arvados.org/issues/11042012-08-18T18:45:20ZTom Cleggtom@curii.com
<p>JSON::ParserError: 743: unexpected token at '<br /><br /><b>Warning</b>: Division by zero in <b>/home/get-evidence/public_html/lib/genome_display.php</b> on line <b>289</b><br />
{"status":{"progress":0,"status":"map <a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: [SDKs] arv pipeline_instance --help should work even if api server is not running / reachable. (Resolved)" href="https://dev.arvados.org/issues/4041">#4041</a> - 0(+0)\/0","logfilename":false,"result_url":"http:\/\/evidence.personalgenomes.org\/genomes?display_genome_id=c5a9e34e0e82e7c362218954d3160ffb82dc2171"}}'</p> GET-Evidence - Bug #500 (New): Trait-o-matic GET-E importer "AttributeError: 'tuple' object has n...https://dev.arvados.org/issues/5002010-05-17T18:09:59ZTom Cleggtom@curii.com
<p>Some input files cause gff_get-evidence_map.py to crash:</p>
<pre>
==> /scratch/tmp/811786ad1ae74adfdd20dd0372abaaebc6246e343aebd01da0bfc4c02bf0106c-out/lock <==
Traceback (most recent call last):
File "/home/trait/core/gff_get-evidence_map.py", line 295, in <module>
main()
File "/home/trait/core/gff_get-evidence_map.py", line 242, in main
leftover_alleles.remove(ref_allele)
[[AttributeError]]: 'tuple' object has no attribute 'remove'
</pre> GET-Evidence - Bug #494 (New): Warning about automatic logouthttps://dev.arvados.org/issues/4942010-05-10T17:42:58ZMadeleine Ballmpball@gmail.com
<p>When the program automatically logs out, no warning appears on the screen and any data entered is lost upon trying to save the changes.</p>