Actions
Feature #21815
closedExclude identifiers from trigram search
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-
Release:
Release relationship:
Auto
Description
Inspired by #21737
This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):
CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin ((((((((((((((((((( COALESCE(owner_uuid, ''::character varying))::text || ' '::text) || ( COALESCE(modified_by_client_uuid, ''::character varying))::text) || ' '::text) || ( COALESCE(modified_by_user_uuid, ''::character varying))::text) || ' '::text) || ( COALESCE(portable_data_hash, ''::character varying))::text) || ' '::text) || ( COALESCE(uuid, ''::character varying))::text) || ' '::text) || ( COALESCE(name, ''::character varying))::text) || ' '::text) || ( COALESCE(description, ''::character varying))::text) || ' '::text) || COALESCE((properties)::text, ''::text)) || ' '::text) || COALESCE(file_names, ''::text))) gin_trgm_ops)
Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.
The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.
Task:
- exclude
portable_data_hash
and any field ending_uuid
fromfull_text_searchable_columns
- add a migration that recreates the trigram indexes for each table with the new
full_text_coalesce
Related issues
Updated by Peter Amstutz 6 months ago
- Release set to 70
- Description updated (diff)
Updated by Peter Amstutz 6 months ago
- Target version changed from 439 to Development 2024-06-05 sprint
Updated by Brett Smith 6 months ago
21815-trigrams-exclude-ids @ d8d02bf3190eeb2df5d488e7fd3f489f0f9d5cc5 - developer-run-tests: #4273
- All agreed upon points are implemented / addressed.
- Yes
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above. Added a test to confirm that UUID and hash columns are excluded. Note that the test immediately above the new one tests that the index exists with the desired columns.
- Documentation has been updated.
- Added an upgrade note about the potentially slow migration.
- Behaves appropriately at the intended scale (describe intended scale).
- No change in scale.
- Considered backwards and forwards compatibility issues between client and server.
- No compatibility change, just better search results
- Follows our coding standards and GUI style guidelines.
- Yes
Updated by Peter Amstutz 6 months ago
- Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Updated by Brett Smith 5 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|7f0f12c40238f3eb12a51877a755cf22357e0767.
Updated by Brett Smith 3 months ago
- Related to Bug #22052: Exclude container_image column from container_requests trigram index added
Actions