Project

General

Profile

Feature #14573

Updated by Tom Clegg almost 6 years ago

See #13752, #14560 for previous attempts. 
 * Indexing on text fields cannot handle medium-size text inputs. 
 * Indexing on to_tsvector(...) cannot handle certain large text inputs (limit depends on content, not just size). Result: crash when creating the index or when inserting a row, whichever happens last. 

 Approaches that have been considered: Possible approach: 
 * Add a tsvector column. Populate it with to_tsvector(...) where possible. Where not possible, either populate with partial content (to_tsvector(substring(...))), or leave it null and adjust the search query to do an unindexed fulltext search on such rows. A function with an "exception clause":https://www.postgresql.org/docs/9.6/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING might work. 
 * Use something other than Postgresql for text search. 
 * [[Index of files in collections]] 

 *Spike goal: validate that the [[Index of files in collections]] approach can return the desired results, and performs well on a production-size database.* 

 Suggested implementation: 
 * Retrieve all collections from a production-size cluster, extract the pdh/dir/file/size info, and insert into a table on a dev database. 
 * Try various ways of indexing/reformatting the dir/filenames so the example searches run quickly and return useful results. 
 * Provide table of speeds/results for various approaches. 

Back