Feature #14573

[API] Fully functional filename search

Added by Tom Clegg 12 days ago. Updated 10 days ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


See #13752, #14560 for previous attempts.
  • Indexing on text fields cannot handle medium-size text inputs.
  • Indexing on to_tsvector(...) cannot handle certain large text inputs (limit depends on content, not just size). Result: crash when creating the index or when inserting a row, whichever happens last.
Possible approach:
  • Add a tsvector column. Populate it with to_tsvector(...) where possible. Where not possible, either populate with partial content (to_tsvector(substring(...))), or leave it null and adjust the search query to do an unindexed fulltext search on such rows. A function with an exception clause might work.
  • Use something other than Postgresql for text search.


#1 Updated by Tom Clegg 12 days ago

  • Tracker changed from Bug to Feature

#2 Updated by Tom Clegg 12 days ago

  • Description updated (diff)

#3 Updated by Peter Amstutz 10 days ago

Postgres text search has other problems when it comes to segmenting filenames. However I don't think that means we give up postgres, should create our own filename search table that has the behavior we want.


Proposed solution:

Maintain our own filename index in a new table.

keyword → collection PDH

Perform custom filename tokenizing and support prefix search with "like" (can use B-tree indexes). Split on symbols like "_", "-" and ".", CamelCase (lower&arr;upper transitions). Convert everything to lowercase. For example:


Would turn into:


Searches would be prefix searches, eg a search on "RMF1U7F" would be keyword like 'rmf1u7f%'

Also available in: Atom PDF