Idea #21738: Text search queries are slow, especially for strings of numbers - Arvados

Actions

Copy link

Idea #21738

open

Text search queries are slow, especially for strings of numbers

Added by Peter Amstutz 20 days ago. Updated 20 days ago.

Status:

New

Priority:

Normal

Assigned To:

Category:

API

Target version:

Future

Start date:

Due date:

Story points:

Description

This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):

CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin (((((((((((((((((((COALESCE)::text || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || COALESCE::text, ''::text)) || ' '::text) || COALESCE)) public.gin_trgm_ops);

Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.

The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.

Related issues