Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422019-10-07T18:00:02ZArvados
Redmine Arvados - Bug #15695 (Closed): [a-d-c] Long delay before cloud dispatcher starts jobs on playgroundhttps://dev.arvados.org/issues/156952019-10-07T18:00:02ZTom Morristfmorris@veritasgenetics.com
<p>This workflow wants 100 parallel jobs running the same code over different date.</p>
<p>There are two separate runs shown in the Prometheus graph below:<br /><a class="external" href="https://prometheus.curoverse.com/consoles/qr1hi/index.html#pctc%7B%22duration%22%3A7200%2C%22endTime%22%3A1570223940%7D">https://prometheus.curoverse.com/consoles/qr1hi/index.html#pctc%7B%22duration%22%3A7200%2C%22endTime%22%3A1570223940%7D</a></p>
<p>The timeline is (all times UTC):<br />19:24 First run submitted with requirements for 100 x 4 core nodes - <a class="external" href="https://workbench.qr1hi.arvadosapi.com/container_requests/qr1hi-xvhdp-ctw1t6m8z718emc">https://workbench.qr1hi.arvadosapi.com/container_requests/qr1hi-xvhdp-ctw1t6m8z718emc</a><br />19:41 17 nodes with 4 cores each started<br />19:57 Workflow canceled<br />19:57 75 nodes idle<br />19:58 Second run submitted with edited run time requirements for 100 x 2 core nodes - <a class="external" href="https://workbench.qr1hi.arvadosapi.com/container_requests/qr1hi-xvhdp-2ry6g3l031wlygu">https://workbench.qr1hi.arvadosapi.com/container_requests/qr1hi-xvhdp-2ry6g3l031wlygu</a><br />20:00 71 nodes idle from 1st cancelled workflow<br />20:03 1 node busy, 0 nodes idle<br />20:26 1st node child container started<br />20:31 2nd node for child container start<br />20:36 39 nodes booting for child containers<br />20:40 another 22 nodes start booting<br />20:46 final 7 nodes start booting<br />20:55 All 100 containers finally running</p> Arvados - Idea #15493 (Duplicate): Allow admin to configure Unix account idhttps://dev.arvados.org/issues/154932019-07-24T14:53:58ZTom Morristfmorris@veritasgenetics.com
<p>Rather than generating a Unix ID for the shell account arbitrarily, use a value provided by the authentication/directory service.</p>
<p>In the case of Google auth, this could be provided in the form of a specially formatted alternative email address registered with Google.</p> Arvados Workbench 2 - Idea #15425 (Resolved): Add "search all versions" checkbox to Search UIhttps://dev.arvados.org/issues/154252019-06-26T17:54:58ZTom Morristfmorris@veritasgenetics.com
<p>The default search includes only the current versions. Add a checkbox to allow users to search all versions.</p> Arvados - Idea #13790 (Resolved): Add metrics endpoint to arvados-controllerhttps://dev.arvados.org/issues/137902018-07-11T15:08:48ZTom Morristfmorris@veritasgenetics.com
<p>Record time to status and request duration:</p>
<ul>
<li>local requests only</li>
<li>federated requests only</li>
</ul>
<p>If separating out request types is too hard, just add the basic request timings.</p>
<p>Use prometheus client for Go.</p> Arvados - Idea #13219 (Resolved): Allow user to specify time limit for submitted jobshttps://dev.arvados.org/issues/132192018-03-14T19:31:12ZTom Morristfmorris@veritasgenetics.com
<p>Allow a user to specify a maximum amount of run time which can be used by a job before it is cancelled.</p>
<p>Involves:</p>
<ul>
<li>Document "time_limit" in container_request/container scheduling_parameters (time in seconds)</li>
<li>crunch-dispatch-slurm passes "--time" parameter to sbatch</li>
<li> crunch-run also stops container after exceeding time_limit (so the feature works independently of slurm, eg crunch-dispatch-local)</li>
<li>Support TimeLimit requirement in arvados-cwl-runner (upcoming feature of CWL v1.1)</li>
</ul> Arvados - Idea #12239 (New): Allow templating of the collection sharing web page https://dev.arvados.org/issues/122392017-09-12T19:25:45ZTom Morristfmorris@veritasgenetics.com
<p>As a system administrator for a cluster with custom branding, I would like to use templating on the collection sharing download page to match the branding on the rest of my web site</p> Arvados - Idea #12197 (Resolved): Add PDH column and filter to trash displayhttps://dev.arvados.org/issues/121972017-08-30T19:03:33ZTom Morristfmorris@veritasgenetics.comArvados - Idea #12085 (Resolved): Add monitoring/alarm for failed/slow job dispatch & excess idle...https://dev.arvados.org/issues/120852017-08-08T14:02:38ZTom Morristfmorris@veritasgenetics.com
<p>We need some additional monitoring and alarms to catch situations like yesterday's crunch-dispatch.rb file descriptor issue.</p>
Some suggestions for alarm conditions:
<ul>
<li>more than N (15? 15% of running_nodes?) idle nodes for more than M (10?) minutes</li>
<li>jobs queued for more than 15 minutes when there is idle capacity in the cluster (running_nodes < 0.95 * max_nodes)</li>
</ul>
<p>The thresholds, sampling periods, and triggers periods can be adjusted as we gain experience with what's too little or too much. The goal is to ignore brief transients or normal steady state churn, but quickly (< 1 hr) catch abnormal conditions which otherwise take us hours to notice on an ad hoc basis.</p> Arvados - Idea #12032 (Resolved): [API] Allow projects to be deleted (ie placed in the trash can)https://dev.arvados.org/issues/120322017-07-25T18:39:23ZTom Morristfmorris@veritasgenetics.com
<p>As a user, I would like deleted projects to be placed in the Trash rather than cluttering my Home project and when the Trash is emptied, the project and all of its contents, recursively, get deleted.</p>
Details about desired behavior after the Workbench "trash" button is hit on a project:
<ul>
<li>The project and everything below it (including other subprojects) stop showing up in the usual places in Workbench, arv-mount, etc., even if there are explicit permission links (like collection-sharing links) to the project or its contents.</li>
<li>The project and its contents can still be retrieved using "get" and "list" API calls with the include_trash flag.</li>
<li>The project appears in its parent project's "trashed items" tab in Workbench. Clicking the project shows the trashed project with its implicitly-trashed contents: there is an obvious indication that the project itself is trashed, and (TBD?) all of its contents appear in the "trashed items" tab instead of their usual places.</li>
<li>If a user follows an old link/bookmark to the project, the page is not found. Except: (TBD?) The "not found" error page should acknowledge that the project is still available in the trash, and offer a link to the "trashed project" view described above.</li>
<li>The project and all of its previously-untrashed contents can be un-trashed by calling the "untrash" method on the project. However, if the project already contained items which were trash before the project was trashed, untrashing the project does not untrash those items.</li>
<li>Any individual item contained in the trashed project can be untrashed by changing its owner_uuid field to a project/user that is not trashed.</li>
<li>Data integrity: There is no sequence of API calls that could result in a collection being deleted (or otherwise made invisible to keep-balance) less than blobSignatureTTL seconds after the last time a client read or updated that collection's manifest.</li>
</ul>
The trash/untrash APIs should be similar to the collection trash/untrash APIs. Specifically:
<ul>
<li>Each project should have a delete_at timestamp that can be set to a time ≥blobSignatureTTL in the future</li>
</ul>
<p>The trash/untrash APIs should work quickly and atomically, even when a large hierarchy of items is affected. This implies that the permission graph gains a "trashed?" flag, which is taken into account when retrieving results for get/list APIs, even for admin users.</p>
Components to be updated:
<ul>
<li>API server</li>
<li>API docs</li>
<li>Workbench</li>
<li>arv-mount</li>
</ul> Arvados - Task #11018 (Resolved): Add support for InitialWorkDirRequirement to arvados-cwl-runnerhttps://dev.arvados.org/issues/110182017-01-31T21:38:50ZTom Morristfmorris@veritasgenetics.comArvados - Idea #10604 (Resolved): Additional Crunch2 CWL User Guide updateshttps://dev.arvados.org/issues/106042016-11-23T19:45:34ZTom Morristfmorris@veritasgenetics.com
<p>A few ideas (I can work on these, once we agree on the list of changes to be made):</p>
1. Run a pipeline using Workbench section:
<ul>
<li>Rename as "Using Crunch" or something like that</li>
<li>Move "Introduction to Crunch" from "Develop an Arvados Pipeline" as the first item in this section</li>
<li>Add a new page "Running a workflow using Workbench" after "Accessing workbench". This would of course need adding a tutorial workflow to qr1hi and authoring this page etc</li>
<li>Rename "Running a pipeline using workbench" as "Running a pipeline using workbench (Deprecated)"</li>
</ul>
2. user/cwl/cwl-runner.html
<ul>
<li>Registering a workflow with Workbench => "Use --create-template to register a CWL workflow" -- use "--create-workflow" instead?</li>
</ul>
3. topics/arv-copy.html
<ul>
<li>How to copy a pipeline template => Do we need to add any notes here about workflows? Otherwise, add a "How to copy a workflow" section?</li>
</ul>
4. Working on the command line
<ul>
<li>Append "(Deprecated)" to both page titles?</li>
<li>Do we want to add a page to work with container_requests and / or workflows here?</li>
</ul>
5. Develop an Arvados pipeline
<ul>
<li>Rename it as "Develop an Arvados pipeline (Deprecated)" ?</li>
<li>This would need moving a couple pages.</li>
<li>Move "Introduction to Crunch" to the top of the list as noted in the first comment</li>
<li>Add a new "Docker" section and move "Customizing Crunch environment using Docker" into it?</li>
</ul> Arvados - Bug #10587 (Resolved): All Python CLI utilities should report --versionhttps://dev.arvados.org/issues/105872016-11-22T17:06:00ZTom Morristfmorris@veritasgenetics.comArvados - Idea #10354 (New): Add varchar_pattern_ops to all relevant PostgreSQL UUID indexeshttps://dev.arvados.org/issues/103542016-10-25T21:49:37ZTom Morristfmorris@veritasgenetics.com
<p>Change all UUID varchar B-tree indexes so that they use varchar_pattern_ops so that we can do efficient prefix searches as described in <br /><a class="external" href="https://dev.arvados.org/issues/10028#note-11">https://dev.arvados.org/issues/10028#note-11</a></p>
<p>Add a database migration to recreate the indexes and flag Ops that the migration needs to be run as part of the upgrade/deployment process.</p> Arvados - Support #10187 (Resolved): Add support for PartitionName hint to arvados-cwl-runnerhttps://dev.arvados.org/issues/101872016-10-04T18:31:35ZTom Morristfmorris@veritasgenetics.comArvados - Feature #9877 (Duplicate): Add upload rate limit to arv-puthttps://dev.arvados.org/issues/98772016-08-29T19:55:56ZTom Morristfmorris@veritasgenetics.com
<p>As a file uploading user, I may want to limit the amount of network bandwidth that arv-put uses so that it plays nicely with other consumers of bandwidth. Cf. the --bwlimit option on rsync.</p>