Project

General

Profile

Actions

Bug #20858

closed

"properties exists" filter very slow

Added by Peter Amstutz 8 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-
Release relationship:
Auto

Description

Observed on user cluster, the query used by http_to_keep to search for a URL to see if it was previously imported into Keep or not, takes 10s - 20s to respond. It ought to be answered by an index and only take 100ms.

Specifically:

r1 = api.collections().list(filters=[["properties", "exists", url]]).execute()

arvados_development=# explain analyze select uuid, properties from collections where jsonb_exists(properties, 'type');
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on collections  (cost=0.00..5616.45 rows=14732 width=112) (actual time=0.056..54.770 rows=33740 loops=1)
   Filter: jsonb_exists(properties, 'type'::text)
   Rows Removed by Filter: 10437
 Planning Time: 0.249 ms
 Execution Time: 56.583 ms
(5 rows)

arvados_development=# explain analyze select uuid, properties from collections where properties ? 'type';
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on collections  (cost=12.34..176.59 rows=44 width=112) (actual time=12.918..43.863 rows=33740 loops=1)
   Recheck Cond: (properties ? 'type'::text)
   Heap Blocks: exact=4571
   ->  Bitmap Index Scan on collection_index_on_properties  (cost=0.00..12.33 rows=44 width=0) (actual time=10.725..10.726 rows=33759 loops=1)
         Index Cond: (properties ? 'type'::text)
 Planning Time: 0.252 ms
 Execution Time: 46.099 ms
(7 rows)

Apparently ? and jsonb_exists() don't get the same treatment from the query optimizer:

https://dba.stackexchange.com/questions/90002/postgresql-operator-uses-index-but-underlying-function-does-not

We have the same problem, where using ? conflicts with the parameter replacement. Fortunately, this stack overflow ticket also provides a workaround by defining a postgres function that uses the operator, and can be inlined.

arvados_development=# CREATE OR REPLACE FUNCTION jb_contains(jsonb, text)
arvados_development-#   RETURNS bool AS
arvados_development-# 'SELECT $1 ? $2' LANGUAGE sql IMMUTABLE;
CREATE FUNCTION

arvados_development=# explain analyze select uuid, properties from collections where jb_contains(properties, 'type');
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on collections  (cost=12.34..176.59 rows=44 width=112) (actual time=17.727..44.716 rows=33740 loops=1)
   Recheck Cond: (properties ? 'type'::text)
   Heap Blocks: exact=4571
   ->  Bitmap Index Scan on collection_index_on_properties  (cost=0.00..12.33 rows=44 width=0) (actual time=14.972..14.972 rows=33759 loops=1)
         Index Cond: (properties ? 'type'::text)
 Planning Time: 0.276 ms
 Execution Time: 46.536 ms
(7 rows)


Subtasks 1 (0 open1 closed)

Task #20859: Review 20858-jsonb-existsResolvedLucas Di Pentima08/15/2023Actions
Actions

Also available in: Atom PDF