Project

General

Profile

Actions

Feature #21815

closed

Exclude identifiers from trigram search

Added by Peter Amstutz 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
-
Release:
Release relationship:
Auto

Description

Inspired by #21737

This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):

CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin (((((((((((((((((((
  COALESCE(owner_uuid, ''::character varying))::text || ' '::text) || (
  COALESCE(modified_by_client_uuid, ''::character varying))::text) || ' '::text) || (
  COALESCE(modified_by_user_uuid, ''::character varying))::text) || ' '::text) || (
  COALESCE(portable_data_hash, ''::character varying))::text) || ' '::text) || (
  COALESCE(uuid, ''::character varying))::text) || ' '::text) || (
  COALESCE(name, ''::character varying))::text) || ' '::text) || (
  COALESCE(description, ''::character varying))::text) || ' '::text) || 
  COALESCE((properties)::text, ''::text)) || ' '::text) || 
  COALESCE(file_names, ''::text)))
gin_trgm_ops)

Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.

The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.

Task:

  1. exclude portable_data_hash and any field ending _uuid from full_text_searchable_columns
  2. add a migration that recreates the trigram indexes for each table with the new full_text_coalesce

Subtasks 1 (0 open1 closed)

Task #21826: Review 21815-trigrams-exclude-idsResolvedBrett Smith06/13/2024Actions

Related issues

Related to Arvados - Bug #22052: Exclude container_image column from container_requests trigram indexResolvedBrett SmithActions
Actions

Also available in: Atom PDF