Arvados: Issueshttps://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422023-03-01T20:16:22ZArvados
Redmine Arvados - Bug #20184 (Rejected): Do small batches and small commits in UpdatePriority.run_update_...https://dev.arvados.org/issues/201842023-03-01T20:16:22ZTom Cleggtom@curii.com
<p>Partial fix for <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: Memory usage: Move after_commit UpdatePriority.run_update_thread to controller (Resolved)" href="https://dev.arvados.org/issues/20183">#20183</a>.</p>
<p>We are seeing a deadlock/starvation issue where a container, whose crunch-run process has ended, cannot be finalized even with repeated attempts by arvados-dispatch-cloud (5m postgresql statement timeout), because its "select for update" is blocked by an UpdatePriority thread doing CPU- and memory-intensive things while keeping a database transaction open with uncommitted updates.</p>
<p>The idea here is to reduce the duration of open transactions in UpdatePriority threads.</p> Arvados - Bug #16457 (Rejected): [ws] websocket server should obey TLS.Insecure config when conne...https://dev.arvados.org/issues/164572020-05-20T15:54:56ZTom Cleggtom@curii.comArvados - Idea #14611 (Duplicate): [Epic] Site-wide search for text, filenames, datahttps://dev.arvados.org/issues/146112018-12-12T19:19:10ZTom Cleggtom@curii.com
Arvados has had a "site-wide search" feature but it often fails to meet users' expectations.
<ul>
<li>Full-text search doesn't find exact strings (<a class="issue tracker-6 status-7 priority-4 priority-default closed" title="Idea: Fix postgres search for filenames (Duplicate)" href="https://dev.arvados.org/issues/13508">#13508</a>) and doesn't index all filenames in large collections (#13752, <a class="issue tracker-1 status-3 priority-6 priority-high2 closed" title="Bug: [1.3.0] error: ERROR: string is too long for tsvector (2299194 bytes, max 1048575 bytes) (Resolved)" href="https://dev.arvados.org/issues/14560">#14560</a>).</li>
<li>Substring search is slow, and doesn't index full rows (this is why full-text search was added).</li>
<li>No facility at all for searching file contents.</li>
</ul>
<p>It is possible that we can use PostgreSQL's full-text search to address everything short of searching file contents, with a bit more work on our side (use a dictionary/language other than English, create a table of filenames instead of searching a huge text field with a list of filenames, etc.)</p>
<p>Another approach would be to use a separate tool to index/search the database, and apply Arvados permissions to those results. This could conceivably index file contents as well as database rows.</p> Arvados - Idea #13874 (Duplicate): [CLI] arvados-server "health" subcommandhttps://dev.arvados.org/issues/138742018-07-20T02:41:46ZTom Cleggtom@curii.com
<p>Get current cluster health status using the health aggregator service (arvados-health), display it on stdout, and exit non-zero for any unhealthy/error state.</p> Arvados - Bug #12893 (Duplicate): [Crunch2] Logs should be saved to disk when container is cancelledhttps://dev.arvados.org/issues/128932018-01-02T23:22:02ZTom Cleggtom@curii.com
<p>crunch-run tries to save a log file after the container ends, regardless of final state, but (sometimes?) this doesn't work. Example: <a href="https://arvadosapi.com/su92l-xvhdp-4j98m0zgu9xst51">su92l-xvhdp-4j98m0zgu9xst51</a></p>
Some possible explanations:
<ul>
<li>crunch-dispatch-slurm cancels the slurm job as soon as it notices the container is cancelled. crunch-run catches SIGTERM and tries to write the buffered output and logs, but (according to sample logs) seems to give up 30-40 seconds later without actually writing them.</li>
<li>even if crunch-run gets that far, it seems apiserver would refuse to update the output or log field of a container whose state is Cancelled.</li>
</ul> Arvados - Bug #11822 (Duplicate): [API] Add recursive and include_trash params missing from disco...https://dev.arvados.org/issues/118222017-06-07T17:13:25ZTom Cleggtom@curii.comArvados - Bug #10568 (Duplicate): crunchstat-summary should look for child jobs in "components" f...https://dev.arvados.org/issues/105682016-11-18T20:48:41ZTom Cleggtom@curii.com
Currently, crunchstat-summary finds child jobs by looking at
<ul>
<li>the "components" field, when processing a pipeline instance</li>
<li>"Queued job {uuid}" text in the log messages, when processing a job</li>
</ul>
<p>Since crunchstat-summary was written, we have added a "components" field to job records, and the CWL runner saves the child job UUIDs there instead of logging them as stderr text. Therefore, crunchstat-summary does not see them.</p> Arvados - Bug #10521 (Duplicate): [SDKs] [CLI] "arv collection list" retrieves manifest_text even...https://dev.arvados.org/issues/105212016-11-11T16:00:20ZTom Cleggtom@curii.com
<p>The collections.list API is supposed to omit manifest_text unless it is explicitly requested, and this works for other SDKs like Python.</p>
<p>However, "arv collection list" retrieves and outputs manifest_text values.</p>
<p>It should not override the default column selection unless the caller uses the "--select" option.</p> Arvados - Bug #10120 (Rejected): [Crunch] crunch-dispatch log throttling should not apply to its ...https://dev.arvados.org/issues/101202016-09-22T18:37:34ZTom Cleggtom@curii.com
<p>The premise of log throttling is that an unruly job (producing too much log) shouldn't cause crunch-dispatch or postgres to stop doing <em>other</em> work properly. See <a class="issue tracker-1 status-3 priority-4 priority-default closed parent" title="Bug: [API] In crunch-dispatch, throttle by bytes_per_minute or _node_minute (Resolved)" href="https://dev.arvados.org/issues/3769">#3769</a>.</p>
<p>In the current implementation, when logs are being throttled, the logs also stop appearing in crunch-dispatch's own stderr. Therefore, when logs are being throttled, even a sysadmin can't see what a job is doing. This is especially annoying when the "maximum bytes per job" limit is reached: there is no feedback available anywhere until the job finishes.</p>
Throttling crunch-dispatch's stderr seems undesirable:
<ul>
<li>these logs are already assumed to be processed and rotated efficiently by some external process like runit's svlogd, so the benefit is small</li>
<li>the sysadmin's ability to see logs during busy times is important</li>
</ul>
<p>The only real benefit of throttling logs seems to be avoiding the cost of splitting chunks of stderr into lines and prepending the job uuid to each line as needed.</p>
<p>Perhaps we can make the processing more efficient without losing the logs entirely -- e.g., skip the "prepend job uuid to each line" part, and dump many lines at once to stderr when throttled?</p> Arvados - Bug #9989 (Duplicate): [SDKs] Python SDK should use a more recent version of the google...https://dev.arvados.org/issues/99892016-09-08T17:43:03ZTom Cleggtom@curii.com
<p>Current version is 1.5.3. Python SDK specifies 1.4.2; commit comment (2016-02-19) cites compatibility issues with oauth2 client 2.0. This seems to have been resolved in <a class="external" href="https://github.com/google/google-api-python-client/commit/bb324623419a22fb14da1e22a847b6b09a178aad">https://github.com/google/google-api-python-client/commit/bb324623419a22fb14da1e22a847b6b09a178aad</a></p> Arvados - Bug #8907 (Duplicate): [SSO] package should not depend on postgresqlhttps://dev.arvados.org/issues/89072016-04-07T15:19:18ZTom Cleggtom@curii.comArvados - Bug #8151 (Rejected): [crunchstat-summary] jobs with tasks_per_node>1 need node size re...https://dev.arvados.org/issues/81512016-01-07T15:44:47ZTom Cleggtom@curii.comArvados - Bug #6943 (Duplicate): [Git hosting] arvados-git-httpd should return 4xx, not 5xx, for ...https://dev.arvados.org/issues/69432015-08-10T17:46:29ZTom Cleggtom@curii.com
<p>Currently, as verified by tests, arvados-git-httpd returns 500 when the servers are all working perfectly but the client used an invalid token.</p>
<p>Existing test case:</p>
<pre>
func (s *IntegrationSuite) TestInvalidToken(c *check.C) {
for _, repo := range []string{"active/foo.git", "active/foo/.git"} {
err := s.runGit(c, "no-such-token-in-the-system", "fetch", repo)
c.Assert(err, check.ErrorMatches, `.* 500 while accessing.*`)
}
}
</pre>
<p>This response should have been 401, not 500.</p> Arvados - Bug #4841 (Rejected): [Workbench] When using "Run pipeline" button at the top of projec...https://dev.arvados.org/issues/48412014-12-18T18:57:09ZTom Cleggtom@curii.comArvados - Feature #4505 (Duplicate): [SDKs] "arv get {uuid}" outputs JSON representation of given...https://dev.arvados.org/issues/45052014-11-12T20:49:17ZTom Cleggtom@curii.com