Bug #20858
closed"properties exists" filter very slow
Description
Observed on user cluster, the query used by http_to_keep
to search for a URL to see if it was previously imported into Keep or not, takes 10s - 20s to respond. It ought to be answered by an index and only take 100ms.
Specifically:
r1 = api.collections().list(filters=[["properties", "exists", url]]).execute()
arvados_development=# explain analyze select uuid, properties from collections where jsonb_exists(properties, 'type'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Seq Scan on collections (cost=0.00..5616.45 rows=14732 width=112) (actual time=0.056..54.770 rows=33740 loops=1) Filter: jsonb_exists(properties, 'type'::text) Rows Removed by Filter: 10437 Planning Time: 0.249 ms Execution Time: 56.583 ms (5 rows) arvados_development=# explain analyze select uuid, properties from collections where properties ? 'type'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on collections (cost=12.34..176.59 rows=44 width=112) (actual time=12.918..43.863 rows=33740 loops=1) Recheck Cond: (properties ? 'type'::text) Heap Blocks: exact=4571 -> Bitmap Index Scan on collection_index_on_properties (cost=0.00..12.33 rows=44 width=0) (actual time=10.725..10.726 rows=33759 loops=1) Index Cond: (properties ? 'type'::text) Planning Time: 0.252 ms Execution Time: 46.099 ms (7 rows)
Apparently ?
and jsonb_exists()
don't get the same treatment from the query optimizer:
We have the same problem, where using ?
conflicts with the parameter replacement. Fortunately, this stack overflow ticket also provides a workaround by defining a postgres function that uses the operator, and can be inlined.
arvados_development=# CREATE OR REPLACE FUNCTION jb_contains(jsonb, text) arvados_development-# RETURNS bool AS arvados_development-# 'SELECT $1 ? $2' LANGUAGE sql IMMUTABLE; CREATE FUNCTION arvados_development=# explain analyze select uuid, properties from collections where jb_contains(properties, 'type'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on collections (cost=12.34..176.59 rows=44 width=112) (actual time=17.727..44.716 rows=33740 loops=1) Recheck Cond: (properties ? 'type'::text) Heap Blocks: exact=4571 -> Bitmap Index Scan on collection_index_on_properties (cost=0.00..12.33 rows=44 width=0) (actual time=14.972..14.972 rows=33759 loops=1) Index Cond: (properties ? 'type'::text) Planning Time: 0.276 ms Execution Time: 46.536 ms (7 rows)
Updated by Peter Amstutz over 1 year ago
20858-jsonb-exists @ c1298836a79c1f3734c95f87f11615daf70806e3
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-08-16 to Development 2023-08-30
Updated by Lucas Di Pentima over 1 year ago
That's a really neat trick, and I have looked for a bug report on jsonb_exists()
no using indexes, but to my surprise, it doesn't seem to be regarded as one.
LGTM.
Updated by Peter Amstutz over 1 year ago
I'm building a 2.6.4~rc1-1 hotfix 9864e5c9448b31355ad2c3fa33453d9007c3f085
This is 2.6.3 with this fix only, for a specific customer being impacted by this issue.
Updated by Peter Amstutz over 1 year ago
- Status changed from In Progress to Resolved