Project

General

Profile

Actions

Bug #20990

closed

Need a btree index for postgres to do efficient '=' operations on name

Added by Peter Amstutz 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
-
Release relationship:
Auto

Description

We added a trigram gin index on collection 'name' in #14070

Specifically on https://dev.arvados.org/issues/14070#note-12 we decided that we only needed the trigam gin index because the postgres documentation stated it could be used for the "=" operator.

However, on a user cluster, that doesn't seem to be true. Explain analyze has '=' doing a parallel seq scan.

So we need to add a btree index on 'name' anyway.


Subtasks 1 (0 open1 closed)

Task #20992: Review 20990-name-btreeResolvedPeter Amstutz09/25/2023Actions

Related issues

Related to Arvados - Idea #14004: [Controller] Keep database in tuneNewActions
Actions #1

Updated by Peter Amstutz 7 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 7 months ago

  • Release set to 67
Actions #3

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz 7 months ago

  • Assigned To set to Peter Amstutz
Actions #7

Updated by Peter Amstutz 7 months ago

20990-name-btree @ 03a9907f84d2722bd11ce8c3095de4a5ad3e07b6
developer-run-tests: #3830

  • All agreed upon points are implemented / addressed.
    • Added migration that adds btree index on 'name'
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • Ran migration in arvbox, automated testing tests database setup
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • Greatly improves queries on 'name'
  • Considered backwards and forwards compatibility issues between client and server.
    • n/a
  • Follows our coding standards and GUI style guidelines.
    • yes
Actions #8

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Actions #9

Updated by Peter Amstutz 7 months ago

  • Category set to API
Actions #10

Updated by Tom Clegg 7 months ago

  • Related to Idea #14004: [Controller] Keep database in tune added
Actions #11

Updated by Tom Clegg 7 months ago

If the poor planning was caused by out-of-date statistics, it's possible that adding a new index improved the situation not because the new index was better, but because the statistics got updated.

From https://www.postgresql.org/docs/current/planner-stats.html

For efficiency reasons, reltuples and relpages are not updated on-the-fly, and so they usually contain somewhat out-of-date values. They are updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX.

Perhaps we should
  • try dropping the new index on the affected cluster (and hitting ANALYZE?) and see whether the performance improvement goes away
  • get in the habit of first trying ANALYZE when we see evidence of poor planning.

Or we could just merge this on the assumption postgres docs are misleading re trigram index being used for "=", even it seems like we are missing an opportunity to learn things we need for #14004, and postgres docs are not usually misleading. If that's our approach, the branch LGTM.

Actions #12

Updated by Peter Amstutz 7 months ago

I did run a vacuum analyze on that database a few weeks ago, but it's true that I did not run analyze before adding the index.

Without the btree index, a query with name = 'foo' does not appear to use the trigram index and takes more than a minute.

A query with 'like' (name like 'foo' for exact match) does use the trigram index and takes about 3 seconds.

With the btree index, a query name = 'foo' takes about 3 milliseconds.

Actions #13

Updated by Tom Clegg 7 months ago

Peter Amstutz wrote in #note-12:

Without the btree index, a query with name = 'foo' does not appear to use the trigram index and takes more than a minute.

Do you mean after dropping the btree index? (If so, I think that would confirm that vacuuming without adding a btree index would not have made much difference.)

A query with 'like' (name like 'foo' for exact match) does use the trigram index and takes about 3 seconds.

With the btree index, a query name = 'foo' takes about 3 milliseconds.

I think that means name = 'foo' would probably also take 3s if you removed the btree index, right?

Improving the performance of "=" queries from 3s to 3ms sounds worthwhile, so in that case the vacuum/statistics thing might still be an interesting question, but adding the btree index is a good idea either way.

LGTM

Actions #14

Updated by Peter Amstutz 7 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF