Project

General

Profile

Bug #20858

Updated by Peter Amstutz 7 months ago

Observed on user cluster, the query used by @http_to_keep@ to search for a URL to see if it was previously imported into Keep or not, takes 10s - 20s to respond.    It ought to be answered by an index and only take 100ms. 

 Specifically: 

 @r1 r1 = api.collections().list(filters=[["properties", "exists", url]]).execute()@ 

 <pre> 
 arvados_development=# explain analyze select uuid, properties from collections where jsonb_exists(properties, 'type'); 
                                                     QUERY PLAN                                                      
 ------------------------------------------------------------------------------------------------------------------- 
  Seq Scan on collections    (cost=0.00..5616.45 rows=14732 width=112) (actual time=0.056..54.770 rows=33740 loops=1) 
    Filter: jsonb_exists(properties, 'type'::text) 
    Rows Removed by Filter: 10437 
  Planning Time: 0.249 ms 
  Execution Time: 56.583 ms 
 (5 rows) 

 arvados_development=# explain analyze select uuid, properties from collections where properties ? 'type'; 
                                                                   QUERY PLAN                                                                    
 ----------------------------------------------------------------------------------------------------------------------------------------------- 
  Bitmap Heap Scan on collections    (cost=12.34..176.59 rows=44 width=112) (actual time=12.918..43.863 rows=33740 loops=1) 
    Recheck Cond: (properties ? 'type'::text) 
    Heap Blocks: exact=4571 
    ->    Bitmap Index Scan on collection_index_on_properties    (cost=0.00..12.33 rows=44 width=0) (actual time=10.725..10.726 rows=33759 loops=1) 
          Index Cond: (properties ? 'type'::text) 
  Planning Time: 0.252 ms 
  Execution Time: 46.099 ms 
 (7 rows) 
 </pre> 

 Apparently @?@ and @jsonb_exists()@ don't get the same treatment from the query optimizer: 

 https://dba.stackexchange.com/questions/90002/postgresql-operator-uses-index-but-underlying-function-does-not 

 We have the same problem, where using @?@ conflicts with the parameter replacement.    Fortunately, this stack overflow ticket also provides a workaround by defining a postgres function that uses the operator, and can be inlined. 

 <pre> 
 arvados_development=# CREATE OR REPLACE FUNCTION jb_contains(jsonb, text) 
 arvados_development-#     RETURNS bool AS 
 arvados_development-# 'SELECT $1 ? $2' LANGUAGE sql IMMUTABLE; 
 CREATE FUNCTION 


 arvados_development=# explain analyze select uuid, properties from collections where jb_contains(properties, 'type'); 
                                                                   QUERY PLAN                                                                    
 ----------------------------------------------------------------------------------------------------------------------------------------------- 
  Bitmap Heap Scan on collections    (cost=12.34..176.59 rows=44 width=112) (actual time=17.727..44.716 rows=33740 loops=1) 
    Recheck Cond: (properties ? 'type'::text) 
    Heap Blocks: exact=4571 
    ->    Bitmap Index Scan on collection_index_on_properties    (cost=0.00..12.33 rows=44 width=0) (actual time=14.972..14.972 rows=33759 loops=1) 
          Index Cond: (properties ? 'type'::text) 
  Planning Time: 0.276 ms 
  Execution Time: 46.536 ms 
 (7 rows) 

 </pre> url]]).execute() 

Back