Bug #15902
closedVery slow query for large workflows
Description
A customer reports that the query below takes 32 seconds for a ~3,000 container workflow:
duration: 32570.970 ms execute <unnamed>: SELECT container_requests."uuid", container_requests."owner_uuid", container_requests."created_at", container_requests."modified_at", container_requests."modified_by_client_uuid", container_requests."modified_by_user_uuid", container_requests."name", container_requests."description", container_requests."properties", container_requests."state", container_requests."requesting_container_uuid", container_requests."container_uuid", container_requests."container_count_max", container_requests."runtime_constraints", container_requests."container_image", container_requests."environment", container_requests."cwd", container_requests."command", container_requests."output_path", container_requests."priority", container_requests."expires_at", container_requests."filters", container_requests."container_count", container_requests."use_existing", container_requests."scheduling_parameters", container_requests."output_uuid", container_requests."log_uuid", container_requests."output_name", container_requests."output_ttl"
FROM "container_requests"
WHERE (container_requests.owner_uuid NOT IN (SELECT target_uuid FROM materialized_permission_view WHERE trashed = 1) ) AND (container_requests.requesting_container_uuid in ('zzzzz-dz642-r9n91uyk20xowyf',...[~3000 UUIDs omitted]
) ORDER BY container_requests.modified_at desc, container_requests.uuid LIMIT $1 OFFSET $2
< 2019-11-26 10:50:27.162 EST >DETAIL: parameters: $1 = '1000', $2 = '0'
Here's the EXPLAIN ANALYZE for the original query:
Limit (cost=2385.09..16840.05 rows=1000 width=791) (actual time=32385.658..32385.658 rows=0 loops=1)
-> Index Scan using index_container_requests_on_modified_at_uuid on container_requests (cost=2385.09..1995593.21 rows=137891 width=791) (actual time=32385.656..32385.656 rows=0 loops=1)
Filter: ((NOT (hashed SubPlan 1)) AND ((requesting_container_uuid)::text = ANY ('{zzzzz-dz642-r9n91uyk20xowyf,...}'::text[])))
Rows Removed by Filter: 376597
SubPlan 1
-> Index Only Scan using permission_target_trashed on materialized_permission_view (cost=0.42..2380.40 rows=1709 width=28) (actual time=0.050..0.351 rows=95 loops=1)
Index Cond: (trashed = 1)
Heap Fetches: 95
Planning time: 29.160 ms
Execution time: 32385.751 ms
Deleting the index that the query planner incorrectly chose, index_container_requests_on_modified_at_uuid, dropped the query time from 30+ seconds to ~50 msec.
Limit (cost=83535.57..83538.07 rows=1000 width=780) (actual time=54.419..54.419 rows=0 loops=1)
-> Sort (cost=83535.57..83877.22 rows=136657 width=780) (actual time=54.417..54.417 rows=0 loops=1)
Sort Key: container_requests.modified_at DESC, container_requests.uuid
Sort Method: quicksort Memory: 25kB
-> Index Scan using index_container_requests_on_requesting_container_uuid on container_requests (cost=635.83..76042.82 rows=136657 width=780) (actual time=54.323..54.323 rows=0 loops=1)
Before this was done, a full VACUUM ANALYZE had no affect on performance, but dropping the ORDER BY clause from the query also dramatically improved the query times.
One thing that seems strange is that the query is ordering by not only modified_at
, but also UUID which doesn't really seem to add any value.
We should see if we can adjust the indexes and/or query in a way which the query planner can optimize correctly.
Updated by Peter Amstutz about 5 years ago
- Target version changed from 2020-01-02 Sprint to 2020-01-15 Sprint
Updated by Tom Clegg about 5 years ago
- Assigned To changed from Peter Amstutz to Tom Clegg
Updated by Tom Clegg about 5 years ago
- Status changed from New to In Progress
Deleting the (modified_at desc, uuid) index doesn't seem appealing -- it's likely to make different queries slow.
It may be possible to tune postgresql's statistics / query planner so it chooses the appropriate index.
I suspect the easiest short term solution/workaround is for the client to specify order=uuid or order=(requesting_container_uuid, uuid), rather than using the default order=(modified_at desc, uuid).
I'm guessing the client is workbench or workbench2?
Updated by Peter Amstutz about 5 years ago
- Target version changed from 2020-01-15 Sprint to 2020-01-29 Sprint
Updated by Tom Clegg about 5 years ago
- Related to Bug #15672: List of subprocesses limited in process view added
Updated by Lucas Di Pentima about 5 years ago
I have checked wb2 code and I haven't found code making a request with a 'requesting_container_uuid in [some list]' filter. In fact, current wb2's master branch would fail to show a big workflow because it makes a request with a filter too big to be used on a GET query.
Updated by Tom Clegg almost 5 years ago
15902-slow-query @ 9fc2ff5583fc31c9f75f80dcf7258bed63084568 -- developer-run-tests: #1710
Updated by Lucas Di Pentima almost 5 years ago
This LGTM, waiting for tests to finish before merging.
Updated by Lucas Di Pentima almost 5 years ago
Re-running wb integration tests just in case
Updated by Anonymous almost 5 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|405017353d0a74ef28288744957fd362df98c0e2.