Project

General

Profile

Actions

Idea #21738

open

Text search queries are slow, especially for strings of numbers

Added by Peter Amstutz 20 days ago. Updated 20 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
API
Target version:
Start date:
Due date:
Story points:
-

Description

This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):

CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin (((((((((((((((((((COALESCE)::text || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || COALESCE::text, ''::text)) || ' '::text) || COALESCE)) public.gin_trgm_ops);

Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.

The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.


Related issues

Related to Arvados - Bug #21737: Bad performance on cluster searchNewActions
Actions #1

Updated by Peter Amstutz 20 days ago

  • Related to Bug #21737: Bad performance on cluster search added
Actions #2

Updated by Peter Amstutz 20 days ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz 20 days ago

  • Tracker changed from Bug to Idea
Actions

Also available in: Atom PDF