Bug #14983

[API] searching for "test" on c97qk takes a long time

Added by Peter Amstutz 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

This query:

https://c97qk.arvadosapi.com/arvados/v1/groups/contents?limit=10&offset=0&include_trash=false&filters=[[%22any%22,%22@@%22,%22test%22],[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23collection%22,%22arvados%23containerRequest%22]],[%22uuid%22,%22is_a%22,[%22arvados%23group%22,%22arvados%23containerRequest%22,%22arvados%23collection%22]]]&order=collections.last_modified+desc,container_requests.last_modified+desc,groups.last_modified+desc

takes 37.6 seconds to return.


Related issues

Related to Arvados - Bug #14560: [1.3.0] error: ERROR: string is too long for tsvector (2299194 bytes, max 1048575 bytes)Resolved

History

#1 Updated by Peter Amstutz 2 months ago

  • Status changed from New to In Progress

#2 Updated by Tom Morris 2 months ago

  • Description updated (diff)

#3 Updated by Tom Clegg 2 months ago

Looks like it's mostly waiting for postgres to figure out that 787 of 484K collections match.

Takes <1s if you add &count=none.

#4 Updated by Tom Clegg 2 months ago

  • Status changed from In Progress to New

#5 Updated by Tom Clegg 2 months ago

SELECT COUNT("collections"."id") FROM "collections" WHERE (collections.owner_uuid NOT IN (SELECT target_uuid FROM materialized_permission_view WHERE trashed = 1) AND collections.is_trashed = false AND collections.uuid = collections.current_version_uuid) AND ((to_tsvector('english', substr(coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(portable_data_hash,'') || ' ' || coalesce(uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(file_names,''), 0, 8000)) @@ to_tsquery('test')) AND (1=0 OR 1=1 OR 1=0) AND (1=0 OR 1=0 OR 1=1))

#6 Updated by Tom Clegg 2 months ago

explain analyze SELECT COUNT("collections"."id") FROM "collections" WHERE (collections.owner_uuid NOT IN (SELECT target_uuid FROM materialized_permission_view WHERE trashed = 1) AND collections.is_trashed = false AND collections.uuid = collections.current_version_uuid) AND ((to_tsvector('english', substr(coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(portable_data_hash,'') || ' ' || coalesce(uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(file_names,''), 0, 8000)) @@ to_tsquery('test')) AND (1=0 OR 1=1 OR 1=0) AND (1=0 OR 1=0 OR 1=1))

 Aggregate  (cost=5510.48..5510.49 rows=1 width=4) (actual time=24189.179..24189.179 rows=1 loops=1)
   ->  Index Scan using index_collections_on_owner_uuid_and_name on collections  (cost=6.47..5510.46 rows=5 width=4) (actual time=45.039..24189.151 rows=32 loops=1)
         Filter: ((NOT (hashed SubPlan 1)) AND (to_tsvector('english'::regconfig, substr((((((((((((((((((COALESCE(owner_uuid, ''::character varying))::text || ' '::text) || (COALESCE(modified_by_client_uuid, ''::character varying))::text) || ' '::text) || (COALESCE(modified_by_user_uuid, ''::character varying))::text) || ' '::text) || (COALESCE(portable_data_hash, ''::character varying))::text) || ' '::text) || (COALESCE(uuid, ''::character varying))::text) || ' '::text) || (COALESCE(name, ''::character varying))::text) || ' '::text) || (COALESCE(description, ''::character varying))::text) || ' '::text) || COALESCE((properties)::text, ''::text)) || ' '::text) || COALESCE(file_names, ''::text)), 0, 8000)) @@ to_tsquery('test'::text)))
         Rows Removed by Filter: 479437
         SubPlan 1
           ->  Index Only Scan using permission_target_trashed on materialized_permission_view  (cost=0.28..6.04 rows=1 width=28) (actual time=0.017..0.048 rows=34 loops=1)
                 Index Cond: (trashed = 1)
                 Heap Fetches: 34
 Planning time: 0.278 ms
 Execution time: 24189.271 ms

#7 Updated by Tom Clegg about 2 months ago

The index uses substr(..., 1000000) but the query uses substr(..., 8000).

"explain analyze" execution time drops from 24 s to 0.3 s with the substr() argument changed to 1000000.

#8 Updated by Tom Clegg about 2 months ago

  • Related to Bug #14560: [1.3.0] error: ERROR: string is too long for tsvector (2299194 bytes, max 1048575 bytes) added

Also available in: Atom PDF