Bug #21737
openBad performance on cluster search
Description
- we are not sure if this is a regression in 2.7.2, was observed just after the update
- Seems to scale with the size of the data, ARKAU didn't complete in time, but after doing a clustering of ARKAU it completed in 1.8 minutes, the same search takes 35s in ARPEZ and 13s in ARIND.
The original search is this one, if you limit the search to type group it finishes in the moment
%22any%22,%22ilike%22,%22%258458829%25%22],[%22uuid%22,%22is_a%22,[%22arvados%23collection%22,[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22]],[%22collections.properties.type%22,%22not+in%22,[%22log%22,%22intermediate%22]]]&order=collections.modified_at+desc,container_requests.modified_at+desc,groups.modified_at+desc,container_requests.created_at+desc">https://api.arkau.roche.com/arvados/v1/groups/contents?limit=50&offset=0&include_trash=false&include_old_versions=false&filters=%22any%22,%22ilike%22,%22%258458829%25%22],[%22uuid%22,%22is_a%22,[%22arvados%23collection%22,[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22]],[%22collections.properties.type%22,%22not+in%22,[%22log%22,%22intermediate%22]]]&order=collections.modified_at+desc,container_requests.modified_at+desc,groups.modified_at+desc,container_requests.created_at+desc`
We think it is a index problem in the DB, could you look at it?
Files
Related issues
Updated by Peter Amstutz 20 days ago
reformatted
https://api.arkau.roche.com/arvados/v1/groups/contents? limit=50&offset=0&include_trash=false&include_old_versions=false &filters=[[%22any%22,%22ilike%22,%22%258458829%25%22],[%22uuid%22,%22is_a%22,[%22arvados%23collection%22]], [%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22]], [%22collections.properties.type%22,%22not+in%22,[%22log%22,%22intermediate%22]]] &order=collections.modified_at+desc,container_requests.modified_at+desc,groups.modified_at+desc,container_requests.created_at+desc
Updated by Peter Amstutz 20 days ago
As discussed at today's meeting, I believe this is a limitation of the search index we are using (pg_trgm)
The trigram index works by breaking the search string into 3-character substrings (trigrams) and looking those up in an index. That yields a set of candidate rows that are scanned for exact substring match.
I believe the problem with searches involving only strings of numeric characters is that a 6 digit identifier is broken into two 3-digit trigams, and those individual trigrams are likely to show up a large number of times, resulting in a large number of false positives (0.1% of the table is a lot when there are millions of rows).
If you are able to run an "explain analyze" on the query, that would be extremely helpful in confirming this theory.
Updated by Peter Amstutz 20 days ago
This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):
CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin (((((((((((((((((((COALESCE)::text || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || COALESCE::text, ''::text)) || ' '::text) || COALESCE)) public.gin_trgm_ops);
Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.
The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.
Updated by Peter Amstutz 20 days ago
container_requests_trgm_text_search_idx has the same problem, in addition the "mounts" field probably needs to be excluded because it is likely to contain a bunch of portable data hashes as well. (For provenance, we could separately index mounts in a way that makes it easy to find input collections).
Updated by Peter Amstutz 20 days ago
- Related to Idea #21738: Text search queries are slow, especially for strings of numbers added
Updated by Ivan Diaz Alvarez 19 days ago
- File rkalvpost2_postgres16.2.txt rkalvpost2_postgres16.2.txt added
- File rkalvpost1_postgres14.5.txt rkalvpost1_postgres14.5.txt added
- Postgres 16.2 (which is installed in rkalvpost2 which will replace rkalvpost1 in ARKAU shortly) is much faster (32s vs 50s) This node is not loaded, but that times matches with the number of rows retrieved in the bitmap index scan
- There are two queries, one that gets the results and another that fetches the count of results, if all was done in one query the time would probably get reduced by half