Project

General

Profile

Actions

Idea #14611

closed

[Epic] Site-wide search for text, filenames, data

Added by Tom Clegg over 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

Arvados has had a "site-wide search" feature but it often fails to meet users' expectations.
  • Full-text search doesn't find exact strings (#13508) and doesn't index all filenames in large collections (#13752, #14560).
  • Substring search is slow, and doesn't index full rows (this is why full-text search was added).
  • No facility at all for searching file contents.

It is possible that we can use PostgreSQL's full-text search to address everything short of searching file contents, with a bit more work on our side (use a dictionary/language other than English, create a table of filenames instead of searching a huge text field with a list of filenames, etc.)

Another approach would be to use a separate tool to index/search the database, and apply Arvados permissions to those results. This could conceivably index file contents as well as database rows.


Related issues

Related to Arvados - Idea #13508: Fix postgres search for filenamesDuplicateActions
Related to Arvados - Bug #14560: [1.3.0] error: ERROR: string is too long for tsvector (2299194 bytes, max 1048575 bytes)ResolvedTom CleggActions
Related to Arvados - Bug #6382: [Workbench] Searching through a collection using regex should accept $ instead of \nClosed06/22/2015Actions
Is duplicate of Arvados - Feature #14573: [Spike] [API] Fully functional filename searchResolvedPeter AmstutzActions
Actions #1

Updated by Tom Clegg over 5 years ago

  • Related to Idea #13508: Fix postgres search for filenames added
Actions #3

Updated by Tom Clegg over 5 years ago

  • Related to Bug #14560: [1.3.0] error: ERROR: string is too long for tsvector (2299194 bytes, max 1048575 bytes) added
Actions #4

Updated by Tom Clegg over 5 years ago

  • Related to Bug #6382: [Workbench] Searching through a collection using regex should accept $ instead of \n added
Actions #5

Updated by Peter Amstutz over 5 years ago

I like the idea of a hybrid solution that uses PG full text search for name/description etc fields and uses a specialized database for indexing collection contents, both filenames and contents of documents. We need to be careful we don't start storing reads from fastq files in the full text database though.

Actions #6

Updated by Tom Morris about 5 years ago

  • Target version set to To Be Groomed
Actions #7

Updated by Tom Clegg about 5 years ago

  • Is duplicate of Feature #14573: [Spike] [API] Fully functional filename search added
Actions #8

Updated by Tom Clegg about 5 years ago

  • Status changed from New to Duplicate
  • Target version deleted (To Be Groomed)
Actions

Also available in: Atom PDF