Project

General

Profile

Actions

Feature #14573

closed

[Spike] [API] Fully functional filename search

Added by Tom Clegg over 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
2.0
Release relationship:
Auto

Description

See #13752, #14560 for previous attempts.
  • Indexing on text fields cannot handle medium-size text inputs.
  • Indexing on to_tsvector(...) cannot handle certain large text inputs (limit depends on content, not just size). Result: crash when creating the index or when inserting a row, whichever happens last.
Approaches that have been considered:
  • Add a tsvector column. Populate it with to_tsvector(...) where possible. Where not possible, either populate with partial content (to_tsvector(substring(...))), or leave it null and adjust the search query to do an unindexed fulltext search on such rows. A function with an exception clause might work.
  • Use something other than Postgresql for text search.
  • Index of files in collections

Spike goal: validate that the Index of files in collections approach can return the desired results, and performs well on a production-size database.

Suggested implementation:
  • Retrieve all collections from a production-size cluster, extract the pdh/dir/file/size info, and insert into a table on a dev database.
  • Try various ways of indexing/reformatting the dir/filenames so the example searches run quickly and return useful results.
  • Provide table of speeds/results for various approaches.

Related issues

Related to Arvados - Feature #15106: [API] Index 'like' queries and use for searchResolvedEric Biagiotti06/14/2019Actions
Has duplicate Arvados - Idea #13508: Fix postgres search for filenamesDuplicateActions
Has duplicate Arvados - Idea #14611: [Epic] Site-wide search for text, filenames, dataDuplicateActions
Actions

Also available in: Atom PDF