Bug #13924

Timeout when getting collection by portable data hash

Added by Joshua Randall almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
08/20/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release:
Release relationship:
Auto

Description

All of our workflows are failing because attempts to fetch certain collections by portable data hash take longer than the (now raised to 10m) timeout.

This appears to be entirely because of the database query time:

{"method":"GET","path":"/arvados/v1/collections/3184b10f225a48f1c79cbb29dc2ab9c7%2B333031","format":"json","controller":"Arvados::V1::CollectionsController","action":"show","status":422,"duration":600051.83,"view":2.99,"db":600028.24,"request_id":"req-m410ox3fssr8bj84uayd","client_ipaddr":"10.101.0.160","client_auth":"eglyx-gj3su-jiyd09zx73wfs1w","params":{"alt":"json"},"@timestamp":"2018-07-26T15:53:10.953384161Z","@version":"1","message":"[422] GET /arvados/v1/collections/3184b10f225a48f1c79cbb29dc2ab9c7%2B333031 (Arvados::V1::CollectionsController#show)"}

The db transaction is (from debug logging):

 User Load (16.9ms)  SELECT "users".* FROM "users" WHERE (redirect_to_user_uuid is null) AND "users"."id" IN (1)
  ContainerRequest Load (34.4ms)  SELECT  (octet_length(container_requests.mounts)) as read_length FROM "container_requests" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (container_requests.owner_uuid = target_uuid)) ) AND ((container_requests.uuid in ('eglyx-xvhdp-qxf113lsdqne8s3','eglyx-xvhdp-u63ooxa2kiho9gq','eglyx-xvhdp-a48hnrbqm5w2xtv','eglyx-xvhdp-sge8hvtds9gd1d9','eglyx-xvhdp-16q264tt9juw9s6','eglyx-xvhdp-cotew5i0g5u1uns','eglyx-xvhdp-k2x7jcifa95apkv','eglyx-xvhdp-mcm4c0hrtg06de7','eglyx-xvhdp-hrrk5wtlsxqauil','eglyx-xvhdp-n56l02h8ae95uhx','eglyx-xvhdp-e6qezgekf0qcn5v')))  ORDER BY container_requests.modified_at desc, container_requests.uuid LIMIT 100 OFFSET 0
  ApiClient Load (16.1ms)  SELECT "api_clients".* FROM "api_clients" WHERE "api_clients"."id" IN (0)
  Container Load (24.0ms)  SELECT  "containers"."uuid", "containers"."priority" FROM "containers" WHERE (auth_uuid='eglyx-gj3su-kdkqdibyw00vmpe')  ORDER BY "containers"."id" ASC LIMIT 1
  ApiClientAuthorization Load (22.4ms)  SELECT  "api_client_authorizations".* FROM "api_client_authorizations" WHERE (api_token='lyoqbyam7w6dioc8a882k5ew2i2y8r0amdtzcbvnhk0jjud6a6' and (expires_at is null or expires_at > CURRENT_TIMESTAMP))  ORDER BY "api_client_authorizations"."id" ASC LIMIT 1
  ContainerRequest Load (41.5ms)  SELECT  "container_requests".* FROM "container_requests" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (container_requests.owner_uuid = target_uuid)) ) AND ((container_requests.uuid in ('eglyx-xvhdp-qxf113lsdqne8s3','eglyx-xvhdp-u63ooxa2kiho9gq','eglyx-xvhdp-a48hnrbqm5w2xtv','eglyx-xvhdp-sge8hvtds9gd1d9','eglyx-xvhdp-16q264tt9juw9s6','eglyx-xvhdp-cotew5i0g5u1uns','eglyx-xvhdp-k2x7jcifa95apkv','eglyx-xvhdp-mcm4c0hrtg06de7','eglyx-xvhdp-hrrk5wtlsxqauil','eglyx-xvhdp-n56l02h8ae95uhx','eglyx-xvhdp-e6qezgekf0qcn5v')))  ORDER BY container_requests.modified_at desc, container_requests.uuid LIMIT 100 OFFSET 0
  User Load (14.7ms)  SELECT "users".* FROM "users" WHERE (redirect_to_user_uuid is null) AND "users"."id" IN (1)
  Collection Load (600041.0ms)  SELECT  "collections".* FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND "collections"."portable_data_hash" = $1  ORDER BY trash_at desc LIMIT 1  [["portable_data_hash", "12d154802d181fcd1a5cae44ec8b3f82+19533"]]

As you can see, the failure is due to the "Collection Load" query taking longer than the 10m timeout.

I tried running this manually in the database with an explain analyze:

arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner
_uuid = target_uuid)) AND collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc LIMIT 1;
                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..515.68 rows=1 width=1282) (actual time=1187700.538..1187700.538 rows=1 loops=1)
   ->  Nested Loop Anti Join  (cost=0.43..1686403.08 rows=3273 width=1282) (actual time=1187700.536..1187700.536 rows=1 loops=1)
         Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text)
         ->  Index Scan Backward using index_collections_on_trash_at on collections  (cost=0.43..1686347.05 rows=3637 width=1282) (actual time=1187700.506..1187700.506 rows=1 loops=1)
               Filter: ((NOT is_trashed) AND ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text))
               Rows Removed by Filter: 4117580
         ->  Materialize  (cost=0.00..1.48 rows=1 width=82) (actual time=0.024..0.024 rows=0 loops=1)
               ->  Seq Scan on materialized_permission_view  (cost=0.00..1.48 rows=1 width=82) (actual time=0.020..0.020 rows=0 loops=1)
                     Filter: (trashed = 1)
                     Rows Removed by Filter: 38
 Planning time: 0.434 ms
 Execution time: 1187700.604 ms
(12 rows)

I then tried without the subselect, but that wasn't the problem:

arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc LIMIT 1;
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..463.76 rows=1 width=1282) (actual time=1099196.070..1099196.070 rows=1 loops=1)
   ->  Index Scan Backward using index_collections_on_trash_at on collections  (cost=0.43..1685103.05 rows=3637 width=1282) (actual time=1099196.068..1099196.068 rows=1 loops=1)
         Filter: ((NOT is_trashed) AND ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text))
         Rows Removed by Filter: 4073842
 Planning time: 0.275 ms
 Execution time: 1099196.114 ms
(6 rows)

Without the 'order by' clause the same query is ~2000000x faster (0.567ms vs 1099s):

arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533' LIMIT 1;
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.05 rows=1 width=1282) (actual time=0.528..0.529 rows=1 loops=1)
   ->  Index Scan using index_collections_on_portable_data_hash on collections  (cost=0.56..9047.93 rows=3637 width=1282) (actual time=0.527..0.527 rows=1 loops=1)
         Index Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
         Filter: (NOT is_trashed)
 Planning time: 0.283 ms
 Execution time: 0.567 ms
(6 rows)

Alternatively, if order is important, keeping the 'order by trash_at desc' but adding an extra sort column (uuid) improves on the original query by 250x:

arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc, uuid LIMIT 1;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8759.36..8759.36 rows=1 width=1282) (actual time=4281.976..4281.977 rows=1 loops=1)
   ->  Sort  (cost=8759.36..8768.45 rows=3637 width=1282) (actual time=4281.975..4281.975 rows=1 loops=1)
         Sort Key: trash_at DESC, uuid
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Bitmap Heap Scan on collections  (cost=118.87..8741.17 rows=3637 width=1282) (actual time=23.846..4277.658 rows=6460 loops=1)
               Recheck Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
               Filter: (NOT is_trashed)
               Heap Blocks: exact=6024
               ->  Bitmap Index Scan on index_collections_on_portable_data_hash  (cost=0.00..117.97 rows=4454 width=0) (actual time=22.723..22.723 rows=6460 loops=1)
                     Index Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
 Planning time: 0.304 ms
 Execution time: 4282.079 ms
(12 rows)

autovacuum is running, but I'll try doing an explicit `vacuum analyze collections` and see if that helps the original query.


Subtasks

Task #14048: Review 13924-collections-indexResolvedPeter Amstutz


Related issues

Related to Arvados - Story #14004: [Controller] Keep database in tuneNew

Associated revisions

Revision 067a7263
Added by Tom Clegg almost 2 years ago

Merge branch '13924-collections-index'

closes #13924

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Joshua Randall almost 2 years ago

arvados_api_production=# vacuum analyze verbose collections;
INFO:  vacuuming "public.collections" 
INFO:  scanned index "collections_pkey" to remove 6352 row versions
DETAIL:  CPU 0.60s/0.47u sec elapsed 32.50 sec
INFO:  scanned index "collection_index_on_properties" to remove 6352 row versions
DETAIL:  CPU 0.74s/1.75u sec elapsed 44.35 sec
INFO:  scanned index "collections_full_text_search_idx" to remove 6352 row versions
DETAIL:  CPU 8.28s/14.59u sec elapsed 498.45 sec
INFO:  scanned index "collections_search_index" to remove 6352 row versions
DETAIL:  CPU 6.27s/3.32u sec elapsed 327.48 sec
INFO:  scanned index "index_collections_on_created_at" to remove 6352 row versions
DETAIL:  CPU 0.52s/0.56u sec elapsed 29.73 sec
INFO:  scanned index "index_collections_on_delete_at" to remove 6352 row versions
DETAIL:  CPU 0.65s/0.62u sec elapsed 47.57 sec
INFO:  scanned index "index_collections_on_is_trashed" to remove 6352 row versions
DETAIL:  CPU 0.60s/0.53u sec elapsed 37.17 sec
INFO:  scanned index "index_collections_on_modified_at" to remove 6352 row versions
DETAIL:  CPU 0.54s/0.48u sec elapsed 32.59 sec
INFO:  scanned index "index_collections_on_modified_at_uuid" to remove 6352 row versions
DETAIL:  CPU 2.08s/1.38u sec elapsed 127.23 sec
INFO:  scanned index "index_collections_on_owner_uuid" to remove 6352 row versions
DETAIL:  CPU 1.34s/0.94u sec elapsed 95.31 sec
INFO:  scanned index "index_collections_on_owner_uuid_and_name" to remove 6352 row versions
DETAIL:  CPU 2.42s/1.62u sec elapsed 146.59 sec
INFO:  scanned index "index_collections_on_portable_data_hash" to remove 6352 row versions
DETAIL:  CPU 1.20s/1.18u sec elapsed 80.81 sec
INFO:  scanned index "index_collections_on_trash_at" to remove 6352 row versions
DETAIL:  CPU 0.73s/0.67u sec elapsed 49.06 sec
INFO:  scanned index "index_collections_on_uuid" to remove 6352 row versions
DETAIL:  CPU 1.16s/1.22u sec elapsed 66.31 sec
INFO:  "collections": removed 6352 row versions in 4654 pages
DETAIL:  CPU 0.06s/0.04u sec elapsed 2.60 sec
INFO:  index "collections_pkey" now contains 9291569 row versions in 29897 pages
DETAIL:  6183 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  index "collection_index_on_properties" now contains 9147181 row versions in 67733 pages
DETAIL:  6831 index row versions were removed.
79 index pages have been deleted, 24278 are currently reusable.
CPU 0.98s/0.47u sec elapsed 57.09 sec.
INFO:  index "collections_full_text_search_idx" now contains 9147181 row versions in 631589 pages
DETAIL:  146156 index row versions were removed.
456 index pages have been deleted, 249097 are currently reusable.
CPU 9.60s/4.27u sec elapsed 512.34 sec.
INFO:  index "collections_search_index" now contains 9295310 row versions in 343921 pages
DETAIL:  5939 index row versions were removed.
5 index pages have been deleted, 5 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.09 sec.
INFO:  index "index_collections_on_created_at" now contains 9295606 row versions in 29938 pages
DETAIL:  5939 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "index_collections_on_delete_at" now contains 9295742 row versions in 47018 pages
DETAIL:  5939 index row versions were removed.
124 index pages have been deleted, 124 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  index "index_collections_on_is_trashed" now contains 9295755 row versions in 34678 pages
DETAIL:  5939 index row versions were removed.
82 index pages have been deleted, 82 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  index "index_collections_on_modified_at" now contains 9295923 row versions in 29930 pages
DETAIL:  5939 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_collections_on_modified_at_uuid" now contains 9296333 row versions in 134455 pages
DETAIL:  5939 index row versions were removed.
2 index pages have been deleted, 2 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  index "index_collections_on_owner_uuid" now contains 9296570 row versions in 76077 pages
DETAIL:  5939 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "index_collections_on_owner_uuid_and_name" now contains 7547120 row versions in 141724 pages
DETAIL:  1529 index row versions were removed.
275 index pages have been deleted, 249 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  index "index_collections_on_portable_data_hash" now contains 9297196 row versions in 84967 pages
DETAIL:  1241 index row versions were removed.
9 index pages have been deleted, 9 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.04 sec.
INFO:  index "index_collections_on_trash_at" now contains 9297390 row versions in 46545 pages
DETAIL:  1516 index row versions were removed.
310 index pages have been deleted, 310 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "index_collections_on_uuid" now contains 9297627 row versions in 73772 pages
DETAIL:  1536 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "collections": found 3838 removable, 2078387 nonremovable row versions in 186564 out of 859851 pages
DETAIL:  11 dead row versions cannot be removed yet.
There were 87451 unused item pointers.
Skipped 257 pages due to buffer pins.
0 pages are entirely empty.
CPU 39.48s/35.81u sec elapsed 2289.63 sec.
INFO:  vacuuming "pg_toast.pg_toast_16412" 
INFO:  scanned index "pg_toast_16412_index" to remove 12024 row versions
DETAIL:  CPU 0.90s/1.08u sec elapsed 58.33 sec
INFO:  "pg_toast_16412": removed 12024 row versions in 4811 pages
DETAIL:  CPU 0.10s/0.07u sec elapsed 5.23 sec
INFO:  index "pg_toast_16412_index" now contains 20061400 row versions in 57835 pages
DETAIL:  12024 index row versions were removed.
74 index pages have been deleted, 72 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.03 sec.
INFO:  "pg_toast_16412": found 12024 removable, 1324545 nonremovable row versions in 320430 out of 4916276 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 40011 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 7.59s/4.89u sec elapsed 391.22 sec.
INFO:  analyzing "public.collections" 
INFO:  "collections": scanned 75000 of 859851 pages, containing 811226 live rows and 19 dead rows; 75000 rows in sample, 9160549 estimated total rows
VACUUM

After this vacuum analyze, the performance was just as bad:

arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc LIMIT 1;
                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..289.24 rows=1 width=1285) (actual time=1052091.831..1052091.831 rows=1 loops=1)
   ->  Nested Loop Anti Join  (cost=0.43..1672737.31 rows=5792 width=1285) (actual time=1052091.831..1052091.831 rows=1 loops=1)
         Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text)
         ->  Index Scan Backward using index_collections_on_trash_at on collections  (cost=0.43..1672639.29 rows=6436 width=1285) (actual time=1052091.809..1052091.809 rows=1 loops=1)
               Filter: ((NOT is_trashed) AND ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text))
               Rows Removed by Filter: 4124255
         ->  Materialize  (cost=0.00..1.48 rows=1 width=82) (actual time=0.019..0.019 rows=0 loops=1)
               ->  Seq Scan on materialized_permission_view  (cost=0.00..1.48 rows=1 width=82) (actual time=0.013..0.013 rows=0 loops=1)
                     Filter: (trashed = 1)
                     Rows Removed by Filter: 38
 Planning time: 17.926 ms
 Execution time: 1052091.931 ms
(12 rows)

#2 Updated by Joshua Randall almost 2 years ago

This appears to be one of those "LIMIT 1" (or really "low" limit) postgres problems, as the problem also goes away if you ask for unlimited rows or 100 rows:

arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=15798.31..15812.79 rows=5792 width=1285) (actual time=5541.410..5542.534 rows=6624 loops=1)
   Sort Key: collections.trash_at DESC
   Sort Method: quicksort  Memory: 6920kB
   ->  Hash Anti Join  (cost=213.20..15436.31 rows=5792 width=1285) (actual time=11.842..5525.536 rows=6624 loops=1)
         Hash Cond: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text)
         ->  Bitmap Heap Scan on collections  (cost=211.71..15359.29 rows=6436 width=1285) (actual time=11.804..5521.484 rows=6624 loops=1)
               Recheck Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
               Filter: (NOT is_trashed)
               Heap Blocks: exact=6184
               ->  Bitmap Index Scan on index_collections_on_portable_data_hash  (cost=0.00..210.10 rows=7939 width=0) (actual time=10.904..10.904 rows=6624 loops=1)
                     Index Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
         ->  Hash  (cost=1.48..1.48 rows=1 width=82) (actual time=0.020..0.020 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on materialized_permission_view  (cost=0.00..1.48 rows=1 width=82) (actual time=0.018..0.018 rows=0 loops=1)
                     Filter: (trashed = 1)
                     Rows Removed by Filter: 38
 Planning time: 0.455 ms
 Execution time: 5542.854 ms
(18 rows)

arvados_api_production=# explain analyze select * from (SELECT  "collections".* FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc LIMIT 100) as c;
                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15678.67..15678.92 rows=100 width=1285) (actual time=3041.766..3041.800 rows=100 loops=1)
   ->  Sort  (cost=15678.67..15693.15 rows=5792 width=1285) (actual time=3041.765..3041.788 rows=100 loops=1)
         Sort Key: collections.trash_at DESC
         Sort Method: top-N heapsort  Memory: 126kB
         ->  Nested Loop Anti Join  (cost=211.71..15457.30 rows=5792 width=1285) (actual time=7.695..3034.227 rows=6624 loops=1)
               Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text)
               ->  Bitmap Heap Scan on collections  (cost=211.71..15359.29 rows=6436 width=1285) (actual time=7.676..3030.198 rows=6624 loops=1)
                     Recheck Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
                     Filter: (NOT is_trashed)
                     Heap Blocks: exact=6184
                     ->  Bitmap Index Scan on index_collections_on_portable_data_hash  (cost=0.00..210.10 rows=7939 width=0) (actual time=6.793..6.793 rows=6624 loops=1)
                           Index Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
               ->  Materialize  (cost=0.00..1.48 rows=1 width=82) (actual time=0.000..0.000 rows=0 loops=6624)
                     ->  Seq Scan on materialized_permission_view  (cost=0.00..1.48 rows=1 width=82) (actual time=0.014..0.014 rows=0 loops=1)
                           Filter: (trashed = 1)
                           Rows Removed by Filter: 38
 Planning time: 0.393 ms
 Execution time: 3041.865 ms
(18 rows)

#3 Updated by Joshua Randall almost 2 years ago

Adding an index specific to this query makes it 500000x faster:

arvados_api_production=# create index index_collections_on_portable_data_hash_trash_at_desc on collections (portable_data_hash, trash_at desc);
CREATE INDEX
arvados_api_production=# explain analyze SELECT  "collections".* FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND "collections"."portable_data_hash" = '12d154802d181fcd1a5cae44ec8b3f82+19533'  ORDER BY trash_at desc LIMIT 1;
                                                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.35 rows=1 width=1285) (actual time=2.321..2.321 rows=1 loops=1)
   ->  Nested Loop Anti Join  (cost=0.56..16606.85 rows=5948 width=1285) (actual time=2.320..2.320 rows=1 loops=1)
         Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text)
         ->  Index Scan using index_collections_on_portable_data_hash_trash_at_desc on collections  (cost=0.56..16506.24 rows=6609 width=1285) (actual time=2.298..2.298 rows=1 loops=1)
               Index Cond: ((portable_data_hash)::text = '12d154802d181fcd1a5cae44ec8b3f82+19533'::text)
               Filter: (NOT is_trashed)
         ->  Materialize  (cost=0.00..1.48 rows=1 width=82) (actual time=0.020..0.020 rows=0 loops=1)
               ->  Seq Scan on materialized_permission_view  (cost=0.00..1.48 rows=1 width=82) (actual time=0.018..0.018 rows=0 loops=1)
                     Filter: (trashed = 1)
                     Rows Removed by Filter: 38
 Planning time: 0.509 ms
 Execution time: 2.368 ms
(12 rows)

#4 Updated by Tom Clegg almost 2 years ago

  • Related to Story #14004: [Controller] Keep database in tune added

#5 Updated by Tom Morris almost 2 years ago

  • Target version set to 2018-09-05 Sprint

#6 Updated by Tom Clegg almost 2 years ago

  • Assigned To set to Tom Clegg

#8 Updated by Tom Clegg almost 2 years ago

#9 Updated by Tom Clegg almost 2 years ago

  • Status changed from New to In Progress

#10 Updated by Peter Amstutz almost 2 years ago

Tom Clegg wrote:

13924-collections-index @ 5e2ee106750fb6413ebc463e6c9e7c87cf624780

This looks like it makes index_collections_on_portable_data_hash redundant, can we remove that index?

#11 Updated by Tom Clegg almost 2 years ago

13924-collections-index @ a0446e23b6795c8c90c5e66438a3a1bb82f9368a
  • remove redundant index
  • add missing copyright header

#12 Updated by Peter Amstutz almost 2 years ago

Tom Clegg wrote:

13924-collections-index @ a0446e23b6795c8c90c5e66438a3a1bb82f9368a
  • remove redundant index
  • add missing copyright header

LGTM

#13 Updated by Tom Clegg almost 2 years ago

  • Status changed from In Progress to Resolved

#14 Updated by Ward Vandewege almost 2 years ago

  • Release set to 13

Also available in: Atom PDF