[Epic] Site-wide search for text, filenames, data
- Full-text search doesn't find exact strings (#13508) and doesn't index all filenames in large collections (#13752, #14560).
- Substring search is slow, and doesn't index full rows (this is why full-text search was added).
- No facility at all for searching file contents.
It is possible that we can use PostgreSQL's full-text search to address everything short of searching file contents, with a bit more work on our side (use a dictionary/language other than English, create a table of filenames instead of searching a huge text field with a list of filenames, etc.)
Another approach would be to use a separate tool to index/search the database, and apply Arvados permissions to those results. This could conceivably index file contents as well as database rows.
#5 Updated by Peter Amstutz about 2 years ago
I like the idea of a hybrid solution that uses PG full text search for name/description etc fields and uses a specialized database for indexing collection contents, both filenames and contents of documents. We need to be careful we don't start storing reads from fastq files in the full text database though.