Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422023-03-01T20:16:22ZArvados
Redmine Arvados - Bug #20184 (Rejected): Do small batches and small commits in UpdatePriority.run_update_...https://dev.arvados.org/issues/201842023-03-01T20:16:22ZTom Cleggtom@curii.com
<p>Partial fix for <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: Memory usage: Move after_commit UpdatePriority.run_update_thread to controller (Resolved)" href="https://dev.arvados.org/issues/20183">#20183</a>.</p>
<p>We are seeing a deadlock/starvation issue where a container, whose crunch-run process has ended, cannot be finalized even with repeated attempts by arvados-dispatch-cloud (5m postgresql statement timeout), because its "select for update" is blocked by an UpdatePriority thread doing CPU- and memory-intensive things while keeping a database transaction open with uncommitted updates.</p>
<p>The idea here is to reduce the duration of open transactions in UpdatePriority threads.</p> Arvados - Bug #16457 (Rejected): [ws] websocket server should obey TLS.Insecure config when conne...https://dev.arvados.org/issues/164572020-05-20T15:54:56ZTom Cleggtom@curii.comArvados - Bug #12893 (Duplicate): [Crunch2] Logs should be saved to disk when container is cancelledhttps://dev.arvados.org/issues/128932018-01-02T23:22:02ZTom Cleggtom@curii.com
<p>crunch-run tries to save a log file after the container ends, regardless of final state, but (sometimes?) this doesn't work. Example: <a href="https://arvadosapi.com/su92l-xvhdp-4j98m0zgu9xst51">su92l-xvhdp-4j98m0zgu9xst51</a></p>
Some possible explanations:
<ul>
<li>crunch-dispatch-slurm cancels the slurm job as soon as it notices the container is cancelled. crunch-run catches SIGTERM and tries to write the buffered output and logs, but (according to sample logs) seems to give up 30-40 seconds later without actually writing them.</li>
<li>even if crunch-run gets that far, it seems apiserver would refuse to update the output or log field of a container whose state is Cancelled.</li>
</ul> Arvados - Bug #11822 (Duplicate): [API] Add recursive and include_trash params missing from disco...https://dev.arvados.org/issues/118222017-06-07T17:13:25ZTom Cleggtom@curii.comArvados - Bug #10568 (Duplicate): crunchstat-summary should look for child jobs in "components" f...https://dev.arvados.org/issues/105682016-11-18T20:48:41ZTom Cleggtom@curii.com
Currently, crunchstat-summary finds child jobs by looking at
<ul>
<li>the "components" field, when processing a pipeline instance</li>
<li>"Queued job {uuid}" text in the log messages, when processing a job</li>
</ul>
<p>Since crunchstat-summary was written, we have added a "components" field to job records, and the CWL runner saves the child job UUIDs there instead of logging them as stderr text. Therefore, crunchstat-summary does not see them.</p> Arvados - Bug #10521 (Duplicate): [SDKs] [CLI] "arv collection list" retrieves manifest_text even...https://dev.arvados.org/issues/105212016-11-11T16:00:20ZTom Cleggtom@curii.com
<p>The collections.list API is supposed to omit manifest_text unless it is explicitly requested, and this works for other SDKs like Python.</p>
<p>However, "arv collection list" retrieves and outputs manifest_text values.</p>
<p>It should not override the default column selection unless the caller uses the "--select" option.</p> Arvados - Bug #10120 (Rejected): [Crunch] crunch-dispatch log throttling should not apply to its ...https://dev.arvados.org/issues/101202016-09-22T18:37:34ZTom Cleggtom@curii.com
<p>The premise of log throttling is that an unruly job (producing too much log) shouldn't cause crunch-dispatch or postgres to stop doing <em>other</em> work properly. See <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: [API] In crunch-dispatch, throttle by bytes_per_minute or _node_minute (Resolved)" href="https://dev.arvados.org/issues/3769">#3769</a>.</p>
<p>In the current implementation, when logs are being throttled, the logs also stop appearing in crunch-dispatch's own stderr. Therefore, when logs are being throttled, even a sysadmin can't see what a job is doing. This is especially annoying when the "maximum bytes per job" limit is reached: there is no feedback available anywhere until the job finishes.</p>
Throttling crunch-dispatch's stderr seems undesirable:
<ul>
<li>these logs are already assumed to be processed and rotated efficiently by some external process like runit's svlogd, so the benefit is small</li>
<li>the sysadmin's ability to see logs during busy times is important</li>
</ul>
<p>The only real benefit of throttling logs seems to be avoiding the cost of splitting chunks of stderr into lines and prepending the job uuid to each line as needed.</p>
<p>Perhaps we can make the processing more efficient without losing the logs entirely -- e.g., skip the "prepend job uuid to each line" part, and dump many lines at once to stderr when throttled?</p> Arvados - Bug #9989 (Duplicate): [SDKs] Python SDK should use a more recent version of the google...https://dev.arvados.org/issues/99892016-09-08T17:43:03ZTom Cleggtom@curii.com
<p>Current version is 1.5.3. Python SDK specifies 1.4.2; commit comment (2016-02-19) cites compatibility issues with oauth2 client 2.0. This seems to have been resolved in <a class="external" href="https://github.com/google/google-api-python-client/commit/bb324623419a22fb14da1e22a847b6b09a178aad">https://github.com/google/google-api-python-client/commit/bb324623419a22fb14da1e22a847b6b09a178aad</a></p> Arvados - Bug #8907 (Duplicate): [SSO] package should not depend on postgresqlhttps://dev.arvados.org/issues/89072016-04-07T15:19:18ZTom Cleggtom@curii.comArvados - Bug #8151 (Rejected): [crunchstat-summary] jobs with tasks_per_node>1 need node size re...https://dev.arvados.org/issues/81512016-01-07T15:44:47ZTom Cleggtom@curii.comArvados - Bug #6943 (Duplicate): [Git hosting] arvados-git-httpd should return 4xx, not 5xx, for ...https://dev.arvados.org/issues/69432015-08-10T17:46:29ZTom Cleggtom@curii.com
<p>Currently, as verified by tests, arvados-git-httpd returns 500 when the servers are all working perfectly but the client used an invalid token.</p>
<p>Existing test case:</p>
<pre>
func (s *IntegrationSuite) TestInvalidToken(c *check.C) {
for _, repo := range []string{"active/foo.git", "active/foo/.git"} {
err := s.runGit(c, "no-such-token-in-the-system", "fetch", repo)
c.Assert(err, check.ErrorMatches, `.* 500 while accessing.*`)
}
}
</pre>
<p>This response should have been 401, not 500.</p> Arvados - Bug #4841 (Rejected): [Workbench] When using "Run pipeline" button at the top of projec...https://dev.arvados.org/issues/48412014-12-18T18:57:09ZTom Cleggtom@curii.comArvados - Bug #4102 (Rejected): [Keep] Fix FUSE test suite so it does not need to be updated ever...https://dev.arvados.org/issues/41022014-10-03T19:50:21ZTom Cleggtom@curii.comArvados - Bug #3418 (Rejected): [Crunch] crunch-job ignores task sequencehttps://dev.arvados.org/issues/34182014-07-30T09:55:17ZTom Cleggtom@curii.com
<p>Example: <a href="https://arvadosapi.com/qr1hi-d1hrv-a9kf60pjllvn2v4">qr1hi-d1hrv-a9kf60pjllvn2v4</a></p> Arvados - Bug #3077 (Rejected): Add migration to convert remaining non-root collection owner_uuid...https://dev.arvados.org/issues/30772014-06-23T09:42:49ZTom Cleggtom@curii.com
<p>The Collection model used to have its owner_uuid set to the uuid of the user who created it. Now owner_uuid is always system_user_uuid. Old collections still need to be migrated.</p>
For each collection whose owner_uuid is not system_user_uuid, we need to:
<ol>
<li>Create a new link with
<ul>
<li>owner_uuid = tail_uuid = old collection's owner_uuid</li>
<li>head_uuid = old collection's uuid</li>
<li>link_class = "name" </li>
<li>name = old collection's uuid</li>
<li>system metadata attributes (created_by*, etc) copied from the old collection</li>
</ul>
</li>
<li>Change owner_uuid to system_user_uuid</li>
</ol>
<p>Example:</p>
<pre>
{
"uuid": "07046024ba76642c9f4f27dac4dc931d+242",
"locator": null,
"owner_uuid": "<a href="https://arvadosapi.com/qr1hi-j7d0g-it30l961gq3t0oi">qr1hi-j7d0g-it30l961gq3t0oi</a>",
"created_at": "2014-02-10 08:25:21 UTC",
"modified_by_client_uuid": "<a href="https://arvadosapi.com/qr1hi-ozdt8-obw7foaks3qjyej">qr1hi-ozdt8-obw7foaks3qjyej</a>",
"modified_by_user_uuid": "<a href="https://arvadosapi.com/qr1hi-tpzed-tpj2ff66551eyym">qr1hi-tpzed-tpj2ff66551eyym</a>",
"modified_at": "2014-02-14 18:28:18 UTC",
"portable_data_hash": null,
"redundancy": null,
"redundancy_confirmed_by_client_uuid": null,
"redundancy_confirmed_at": null,
"redundancy_confirmed_as": null,
"updated_at": null,
"manifest_text": ". 37a4633d6484823b4bef5eb818c88bc2+67108864+K@1h9kt+A647c3a9e4b2ce8944b3798404244ed0b8f491f56@53ba9fda 43e2e0a568f2ed215d9966e5c67e498b+67108864+K@1h9kt+Af62db167881509c58b3361ed297478df477e554b@53ba9fda cd7b4bed23e8feae93656f3396de8964+67108864+K@1h9kt+A84203fab608e1e17d0c5c3a9b239182e72fe3d53@53ba9fda b14da7079ff81e9c0c493df888fb6ca4+43499401+K@1h9kt+A2b6aa3cd407fcd33bc94edc7ba36ea4ee8b47c22@53ba9fda 0:244825993:var-GS000015886-ASM.tsv.bz2\n"
}
</pre>
Existing name links could cause a unique constraint violation. In this case either:
<ul>
<li>head_uuid = name. The new name link would be redundant so we should just not create it.</li>
<li>head_uuid does not match, i.e., a user/project has a collection whose name is the uuid of a different collection. This is unlikely enough that skipping creation of the new name (with matching uuid=name) seems like an acceptable sacrifice.</li>
</ul>
<p>This is also the best time to start <strong>enforcing the <code>owner_uuid==system_user_uuid</code> constraint at the Collection model level</strong> by overloading <code>ensure_owner_uuid_is_permitted</code> to return false if <code>owner_uuid != system_user_uuid</code>.</p>