Bug #20990
closedNeed a btree index for postgres to do efficient '=' operations on name
Description
We added a trigram gin index on collection 'name' in #14070
Specifically on https://dev.arvados.org/issues/14070#note-12 we decided that we only needed the trigam gin index because the postgres documentation stated it could be used for the "=" operator.
However, on a user cluster, that doesn't seem to be true. Explain analyze has '=' doing a parallel seq scan.
So we need to add a btree index on 'name' anyway.
Updated by Peter Amstutz about 1 year ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 1 year ago
20990-name-btree @ 03a9907f84d2722bd11ce8c3095de4a5ad3e07b6
developer-run-tests: #3830
- All agreed upon points are implemented / addressed.
- Added migration that adds btree index on 'name'
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described
- Ran migration in arvbox, automated testing tests database setup
- Documentation has been updated.
- n/a
- Behaves appropriately at the intended scale (describe intended scale).
- Greatly improves queries on 'name'
- Considered backwards and forwards compatibility issues between client and server.
- n/a
- Follows our coding standards and GUI style guidelines.
- yes
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Updated by Tom Clegg about 1 year ago
- Related to Idea #14004: [Controller] Keep database in tune added
Updated by Tom Clegg about 1 year ago
If the poor planning was caused by out-of-date statistics, it's possible that adding a new index improved the situation not because the new index was better, but because the statistics got updated.
From https://www.postgresql.org/docs/current/planner-stats.html
Perhaps we shouldFor efficiency reasons, reltuples and relpages are not updated on-the-fly, and so they usually contain somewhat out-of-date values. They are updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX.
- try dropping the new index on the affected cluster (and hitting ANALYZE?) and see whether the performance improvement goes away
- get in the habit of first trying ANALYZE when we see evidence of poor planning.
Or we could just merge this on the assumption postgres docs are misleading re trigram index being used for "=", even it seems like we are missing an opportunity to learn things we need for #14004, and postgres docs are not usually misleading. If that's our approach, the branch LGTM.
Updated by Peter Amstutz about 1 year ago
I did run a vacuum analyze on that database a few weeks ago, but it's true that I did not run analyze before adding the index.
Without the btree index, a query with name = 'foo'
does not appear to use the trigram index and takes more than a minute.
A query with 'like' (name like 'foo'
for exact match) does use the trigram index and takes about 3 seconds.
With the btree index, a query name = 'foo'
takes about 3 milliseconds.
Updated by Tom Clegg about 1 year ago
Peter Amstutz wrote in #note-12:
Without the btree index, a query with
name = 'foo'
does not appear to use the trigram index and takes more than a minute.
Do you mean after dropping the btree index? (If so, I think that would confirm that vacuuming without adding a btree index would not have made much difference.)
A query with 'like' (
name like 'foo'
for exact match) does use the trigram index and takes about 3 seconds.With the btree index, a query
name = 'foo'
takes about 3 milliseconds.
I think that means name = 'foo'
would probably also take 3s if you removed the btree index, right?
Improving the performance of "=" queries from 3s to 3ms sounds worthwhile, so in that case the vacuum/statistics thing might still be an interesting question, but adding the btree index is a good idea either way.
LGTM
Updated by Peter Amstutz about 1 year ago
- Status changed from In Progress to Resolved