https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422018-10-17T15:55:57ZArvadosArvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=678902018-10-17T15:55:57ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> set to <i>2018-10-31 sprint</i></li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=678972018-10-17T16:21:45ZTom Morristfmorris@veritasgenetics.com
<ul><li><strong>Target version</strong> deleted (<del><i>2018-10-31 sprint</i></del>)</li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1117502023-02-14T22:22:08ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>60</i></li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1124892023-02-15T19:29:04ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-1 priority-4 priority-default" href="/issues/13972">Bug #13972</a>: Listing collections by PDH and name can be very slow</i> added</li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1125222023-02-16T15:04:18ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> deleted (<del><i>60</i></del>)</li><li><strong>Target version</strong> set to <i>Future</i></li></ul><p>I feel like it is highly likely it was fixed but someone should double check.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1125422023-02-16T16:51:39ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Peter Amstutz wrote in <a href="#note-5">#note-5</a>:</p>
<blockquote>
<p>I feel like it is highly likely it was fixed but someone should double check.</p>
</blockquote>
<p>We have an index on owner_uuid and name, created in <code>20180919001158_recreate_collection_unique_name_index.rb</code> as part of <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: [API] Store, and add APIs to retrieve, previous versions of collection objects (Resolved)" href="https://dev.arvados.org/issues/13561">#13561</a>, but we don't have an index on <em>just</em> name as suggested in this issue. But maybe it still meets the need? The reported search also checks owner_uuid.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1125442023-02-16T16:52:13ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Story points</strong> set to <i>0.5</i></li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1125452023-02-16T18:23:44ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>Brett Smith wrote in <a href="#note-6">#note-6</a>:</p>
<blockquote>
<p>Peter Amstutz wrote in <a href="#note-5">#note-5</a>:</p>
<blockquote>
<p>I feel like it is highly likely it was fixed but someone should double check.</p>
</blockquote>
<p>We have an index on owner_uuid and name, created in <code>20180919001158_recreate_collection_unique_name_index.rb</code> as part of <a class="issue tracker-2 status-3 priority-4 priority-default closed parent" title="Feature: [API] Store, and add APIs to retrieve, previous versions of collection objects (Resolved)" href="https://dev.arvados.org/issues/13561">#13561</a>, but we don't have an index on <em>just</em> name as suggested in this issue. But maybe it still meets the need? The reported search also checks owner_uuid.</p>
</blockquote>
<p>I guess the issue here is that if the name doesn't exist at all, the query can terminate much faster. Instead, it is doing a full table scan for a name only to come up empty.</p>
<p>A separate standalone "name" index is probably not a bad idea.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1139532023-04-11T18:32:48ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Story points</strong> changed from <i>0.5</i> to <i>1.0</i></li><li><strong>Target version</strong> changed from <i>Future</i> to <i>To be scheduled</i></li></ul><p>Should do this for both collections and projects.</p>
<p>Adding a simple index on 'name' is low risk.</p>
<p>Investigate if adding a trigram index makes sense so that 'like' and 'ilike' searches are also efficient.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1139932023-04-12T15:58:33ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Target version</strong> changed from <i>To be scheduled</i> to <i>Development 2023-04-26 sprint</i></li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1139962023-04-12T15:59:45ZTom Cleggtom@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1143352023-04-21T14:49:51ZTom Cleggtom@curii.com
<ul></ul><p>Peter Amstutz wrote in <a href="#note-9">#note-9</a>:</p>
<blockquote>
<p>Investigate if adding a trigram index makes sense so that 'like' and 'ilike' searches are also efficient.</p>
</blockquote>
<p>From <a class="external" href="https://www.postgresql.org/docs/current/pgtrgm.html">https://www.postgresql.org/docs/current/pgtrgm.html</a></p>
<p>"These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~, ~* and = queries. Inequality operators are not supported. Note that those indexes may not be as efficient as regular B-tree indexes for equality operator."</p>
<p>I think the short version is that if we want to speed up <code>name like '%string%'</code> as well as <code>name = 'string'</code>, then trigram is better. If we want to optimize for <code>=</code> and sacrifice un-anchored <code>like</code>, b-tree is better (smaller index, faster search).</p>
<p>Here's a branch with the trigram option.</p>
<p>14070-name-index @ <a class="changeset" title="14070: Add trigram indexes on collections.name and projects.name. Arvados-DCO-1.1-Signed-off-by:..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b4dbb53529d133660403c1761f06cb321cbc70f6">b4dbb53529d133660403c1761f06cb321cbc70f6</a> -- <a class="external" href="https://ci.arvados.org/job/developer-run-tests/3617/"<a href="https://ci.arvados.org/job/developer-run-tests/3617/">developer-run-tests: #3617 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3617" alt="" /></a></a></p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1143362023-04-21T14:49:56ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1143492023-04-21T15:29:31ZTom Cleggtom@curii.com
<ul></ul><p>This was reported 2018-08-16 and we added the multi-column trigram indexes 2019-05-23. But a <code>name = 'string'</code> search on a current database still uses the multi-column (non-trigram) <code>collections_search_index</code> which is presumably not better than the <code>owner_uuid_and_name</code> index reported here.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1143562023-04-21T18:16:11ZTom Cleggtom@curii.com
<ul></ul><p>14070-name-index @ <a class="changeset" title="14070: Add trigram indexes on collections.name and projects.name. Arvados-DCO-1.1-Signed-off-by:..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/b4dbb53529d133660403c1761f06cb321cbc70f6">b4dbb53529d133660403c1761f06cb321cbc70f6</a> -- <a class="external" href="https://ci.arvados.org/job/developer-run-tests/3619/"<a href="https://ci.arvados.org/job/developer-run-tests/3619/">developer-run-tests: #3619 <img src="https://ci.arvados.org/buildStatus/icon?job=developer-run-tests&build=3619" alt="" /></a></a></p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1143602023-04-21T18:39:19ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Tom Clegg wrote in <a href="#note-12">#note-12</a>:</p>
<blockquote>
<p>I think the short version is that if we want to speed up <code>name like '%string%'</code> as well as <code>name = 'string'</code>, then trigram is better. If we want to optimize for <code>=</code> and sacrifice un-anchored <code>like</code>, b-tree is better (smaller index, faster search).</p>
<p>Here's a branch with the trigram option.</p>
</blockquote>
<p>Without some metrics or other specifics pushing us to prioritize equality performance, I agree improving the performance of more searches seems like the better trade-off. This is good to merge, thanks.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1143742023-04-24T15:26:50ZTom Cleggtom@curii.com
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset <a class="changeset" title="Merge branch '14070-name-index' fixes #14070 Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curi..." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/604f652987e7200f24eec4e31393238ec3960989">arvados|604f652987e7200f24eec4e31393238ec3960989</a>.</p> Arvados - Bug #14070: API DB needs an index on collections namehttps://dev.arvados.org/issues/14070?journal_id=1175352023-08-24T17:58:44ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Release</strong> set to <i>66</i></li></ul>