Project

General

Profile

Actions

Feature #21815

closed

Exclude identifiers from trigram search

Added by Peter Amstutz 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
-
Release:
Release relationship:
Auto

Description

Inspired by #21737

This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):

CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin (((((((((((((((((((
  COALESCE(owner_uuid, ''::character varying))::text || ' '::text) || (
  COALESCE(modified_by_client_uuid, ''::character varying))::text) || ' '::text) || (
  COALESCE(modified_by_user_uuid, ''::character varying))::text) || ' '::text) || (
  COALESCE(portable_data_hash, ''::character varying))::text) || ' '::text) || (
  COALESCE(uuid, ''::character varying))::text) || ' '::text) || (
  COALESCE(name, ''::character varying))::text) || ' '::text) || (
  COALESCE(description, ''::character varying))::text) || ' '::text) || 
  COALESCE((properties)::text, ''::text)) || ' '::text) || 
  COALESCE(file_names, ''::text)))
gin_trgm_ops)

Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.

The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.

Task:

  1. exclude portable_data_hash and any field ending _uuid from full_text_searchable_columns
  2. add a migration that recreates the trigram indexes for each table with the new full_text_coalesce

Subtasks 1 (0 open1 closed)

Task #21826: Review 21815-trigrams-exclude-idsResolvedBrett Smith06/13/2024Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #22052: Exclude container_image column from container_requests trigram indexResolvedBrett SmithActions
Actions #1

Updated by Peter Amstutz 7 months ago

  • Release set to 70
  • Description updated (diff)
Actions #3

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 7 months ago

  • Target version changed from 439 to Development 2024-06-05 sprint
Actions #5

Updated by Peter Amstutz 7 months ago

  • Assigned To set to Brett Smith
Actions #6

Updated by Brett Smith 7 months ago

  • Description updated (diff)
Actions #7

Updated by Brett Smith 7 months ago

  • Description updated (diff)
Actions #8

Updated by Peter Amstutz 7 months ago

  • Status changed from New to In Progress
Actions #9

Updated by Brett Smith 7 months ago

21815-trigrams-exclude-ids @ d8d02bf3190eeb2df5d488e7fd3f489f0f9d5cc5 - developer-run-tests: #4273

  • All agreed upon points are implemented / addressed.
    • Yes
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above. Added a test to confirm that UUID and hash columns are excluded. Note that the test immediately above the new one tests that the index exists with the desired columns.
  • Documentation has been updated.
    • Added an upgrade note about the potentially slow migration.
  • Behaves appropriately at the intended scale (describe intended scale).
    • No change in scale.
  • Considered backwards and forwards compatibility issues between client and server.
    • No compatibility change, just better search results
  • Follows our coding standards and GUI style guidelines.
    • Yes
Actions #11

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Actions #12

Updated by Tom Clegg 6 months ago

This LGTM, thanks.

Actions #13

Updated by Brett Smith 6 months ago

  • Status changed from In Progress to Resolved
Actions #14

Updated by Brett Smith 4 months ago

  • Related to Bug #22052: Exclude container_image column from container_requests trigram index added
Actions

Also available in: Atom PDF