Bug #21737: Bad performance on cluster search - Arvados

Actions

Copy link

Bug #21737

open

Bad performance on cluster search

Added by Ivan Diaz Alvarez 20 days ago. Updated 19 days ago.

Status:

New

Priority:

Normal

Assigned To:

Category:

Target version:

Future

Story points:

Description

We have observed a slowness while using the search bar to search for collections (particullarly with numeric searches):

we are not sure if this is a regression in 2.7.2, was observed just after the update
Seems to scale with the size of the data, ARKAU didn't complete in time, but after doing a clustering of ARKAU it completed in 1.8 minutes, the same search takes 35s in ARPEZ and 13s in ARIND.

The original search is this one, if you limit the search to type group it finishes in the moment

%22any%22,%22ilike%22,%22%258458829%25%22],[%22uuid%22,%22is_a%22,[%22arvados%23collection%22,[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22]],[%22collections.properties.type%22,%22not+in%22,[%22log%22,%22intermediate%22]]]&order=collections.modified_at+desc,container_requests.modified_at+desc,groups.modified_at+desc,container_requests.created_at+desc">https://api.arkau.roche.com/arvados/v1/groups/contents?limit=50&offset=0&include_trash=false&include_old_versions=false&filters=%22any%22,%22ilike%22,%22%258458829%25%22],[%22uuid%22,%22is_a%22,[%22arvados%23collection%22,[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22]],[%22collections.properties.type%22,%22not+in%22,[%22log%22,%22intermediate%22]]]&order=collections.modified_at+desc,container_requests.modified_at+desc,groups.modified_at+desc,container_requests.created_at+desc`

We think it is a index problem in the DB, could you look at it?

Files

Download all files

rkalvpost1_postgres14.5.txt (10.3 KB) rkalvpost1_postgres14.5.txt		Ivan Diaz Alvarez, 04/30/2024 09:58 AM
rkalvpost2_postgres16.2.txt (8.78 KB) rkalvpost2_postgres16.2.txt		Ivan Diaz Alvarez, 04/30/2024 09:58 AM

Related issues

Actions

Copy link

Updated by Peter Amstutz 20 days ago

reformatted

https://api.arkau.roche.com/arvados/v1/groups/contents?
limit=50&offset=0&include_trash=false&include_old_versions=false
&filters=[[%22any%22,%22ilike%22,%22%258458829%25%22],[%22uuid%22,%22is_a%22,[%22arvados%23collection%22]],
[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22]],
[%22collections.properties.type%22,%22not+in%22,[%22log%22,%22intermediate%22]]]
&order=collections.modified_at+desc,container_requests.modified_at+desc,groups.modified_at+desc,container_requests.created_at+desc

Actions

Copy link

Updated by Peter Amstutz 20 days ago

As discussed at today's meeting, I believe this is a limitation of the search index we are using (pg_trgm)

The trigram index works by breaking the search string into 3-character substrings (trigrams) and looking those up in an index. That yields a set of candidate rows that are scanned for exact substring match.

I believe the problem with searches involving only strings of numeric characters is that a 6 digit identifier is broken into two 3-digit trigams, and those individual trigrams are likely to show up a large number of times, resulting in a large number of false positives (0.1% of the table is a lot when there are millions of rows).

If you are able to run an "explain analyze" on the query, that would be extremely helpful in confirming this theory.

Actions

Copy link

Updated by Peter Amstutz 20 days ago

This is what the full search indexes on (the operation is building a string with all the fields separated by spaces and then indexing on that):

CREATE INDEX collections_trgm_text_search_idx ON public.collections USING gin (((((((((((((((((((COALESCE)::text || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || (COALESCE)::text) || ' '::text) || COALESCE::text, ''::text)) || ' '::text) || COALESCE)) public.gin_trgm_ops);

Looking at this, I think it would be much better if all uuid fields and the portable data hash were excluded.

The reasoning is that uuids and the PDH are a string of random alphanumeric characters, generating a lot of trigrams which become potential matches, but not actual matches.

Actions

Copy link

Updated by Peter Amstutz 20 days ago

container_requests_trgm_text_search_idx has the same problem, in addition the "mounts" field probably needs to be excluded because it is likely to contain a bunch of portable data hashes as well. (For provenance, we could separately index mounts in a way that makes it easy to find input collections).

Actions

Copy link

Updated by Peter Amstutz 20 days ago

Target version set to Future

Actions

Copy link

Updated by Peter Amstutz 20 days ago

Related to Idea #21738: Text search queries are slow, especially for strings of numbers added

Actions

Copy link

Updated by Ivan Diaz Alvarez 19 days ago

File rkalvpost2_postgres16.2.txt rkalvpost2_postgres16.2.txt added
File rkalvpost1_postgres14.5.txt rkalvpost1_postgres14.5.txt added

Hi, I have finally ran the explain analyze on the DB, and:

Postgres 16.2 (which is installed in rkalvpost2 which will replace rkalvpost1 in ARKAU shortly) is much faster (32s vs 50s) This node is not loaded, but that times matches with the number of rows retrieved in the bitmap index scan

There are two queries, one that gets the results and another that fetches the count of results, if all was done in one query the time would probably get reduced by half

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Watchers (1)

Bug #21737

Bad performance on cluster search

Updated by Peter Amstutz 20 days ago

Updated by Peter Amstutz 20 days ago

Updated by Peter Amstutz 20 days ago

Updated by Peter Amstutz 20 days ago

Updated by Peter Amstutz 20 days ago

Updated by Peter Amstutz 20 days ago

Updated by Ivan Diaz Alvarez 19 days ago