Actions
Bug #8289
closed[API] Avoid making queries unnecessarily slow by adding superfluous "order by" columns
Story points:
0.5
Description
As an example, when the Python SDK's websocket fallback PollClient starts up, it does an API request with order="id asc". API server translates that to a PostgreSQL query like this:
arvados_production=# SELECT "logs".* FROM "logs" WHERE ((logs.event_type in ('create','update','delete')) AND (logs.id > '3464274')) ORDER BY logs.id desc, logs.modified_at desc, logs.uuid LIMIT 1 OFFSET 0; arvados_production=# explain analyze SELECT "logs".* FROM "logs" WHERE ((logs.event_type in ('create','update','delete')) AND (logs.id > '3464274')) ORDER BY logs.id desc, logs.modified_at desc, logs.uuid LIMIT 1 OFFSET 0; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=822081.89..822081.89 rows=1 width=1290) (actual time=4007.367..4007.368 rows=1 loops=1) -> Sort (cost=822081.89..823218.65 rows=454703 width=1290) (actual time=4007.364..4007.364 rows=1 loops=1) Sort Key: id, modified_at, uuid Sort Method: top-N heapsort Memory: 27kB -> Bitmap Heap Scan on logs (cost=69925.24..819808.37 rows=454703 width=1290) (actual time=531.924..3086.689 rows=869534 loops=1) Recheck Cond: ((id > 3464274) AND ((event_type)::text = ANY ('{create,update,delete}'::text[]))) -> BitmapAnd (cost=69925.24..69925.24 rows=454703 width=0) (actual time=530.935..530.935 rows=0 loops=1) -> Bitmap Index Scan on logs_pkey (cost=0.00..29223.51 rows=1433428 width=0) (actual time=249.940..249.940 rows=1954308 loops=1) Index Cond: (id > 3464274) -> Bitmap Index Scan on index_logs_on_event_type (cost=0.00..40474.13 rows=1341921 width=0) (actual time=270.632..270.632 rows=1735755 loops=1) Index Cond: ((event_type)::text = ANY ('{create,update,delete}'::text[])) Total runtime: 4007.417 ms
"id" is a primary key, so the "logs.modified_at desc, logs.uuid" part of the "order by" clause can't possibly make a difference to the results. If we remove them, the query looks like this:
arvados_production=# explain analyze SELECT "logs".* FROM "logs" WHERE ((logs.event_type in ('create','update','delete')) AND (logs.id > '3464274')) ORDER BY logs.id desc LIMIT 1 OFFSET 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..8.67 rows=1 width=1290) (actual time=0.049..0.050 rows=1 loops=1) -> Index Scan Backward using logs_pkey on logs (cost=0.00..3943415.80 rows=454703 width=1290) (actual time=0.046..0.046 rows=1 loops=1) Index Cond: (id > 3464274) Filter: ((event_type)::text = ANY ('{create,update,delete}'::text[])) Total runtime: 0.078 ms
Actions