Bug #14075
closedextremely long running database query
Description
our system has developed a new problem and seems unable to answer this sort of query within the time limit of the api server:
SELECT collections."uuid", collections."owner_uuid", collections."created_at", collections."modified_by_client_uuid", collections."modified_by_user_uuid", collections."modified_at", collections."name", collections."description", collections."properties", collections."portable_data_hash", collections."replication_desired", collections."replication_confirmed", collections."replication_confirmed_at", collections."storage_classes_desired", collections."storage_classes_confirmed", collections."storage_classes_confirmed_at", collections."delete_at", collections."trash_at", collections."is_trashed" FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND ((collections.portable_data_hash = 'd41d8cd98f00b204e9800998ecf8427e+0') AND (collections.name like 'dict_to_interval_list.cwl dependencies%')) ORDER BY collections.modified_at desc, collections.uuid LIMIT 1 OFFSET 0;
Updated by Joshua Randall over 6 years ago
arvados_api_production=# explain analyze SELECT collections."uuid", collections."owner_uuid", collections."created_at", collections."modified_by_client_uuid", collections."modified_by_user_uuid", collections."modified_at", collections."name", collections."description", collections."properties", collections."portable_data_hash", collections."replication_desired", collections."replication_confirmed", collections."replication_confirmed_at", collections."storage_classes_desired", collections."storage_classes_confirmed", collections."storage_classes_confirmed_at", collections."delete_at", collections."trash_at", collections."is_trashed" FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND ((collections.portable_data_hash = 'd41d8cd98f00b204e9800998ecf8427e+0') AND (collections.name like 'dict_to_interval_list.cwl dependencies%')) ORDER BY collections.modified_at desc, collections.uuid LIMIT 1 OFFSET 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..209440.28 rows=1 width=892) (actual time=272684.420..272684.420 rows=0 loops=1) -> Nested Loop Anti Join (cost=0.56..4188794.86 rows=20 width=892) (actual time=272684.419..272684.419 rows=0 loops=1) Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text) -> Index Scan using index_collections_on_modified_at_uuid on collections (cost=0.56..4188793.04 rows=23 width=892) (actual time=272684.416..272684.416 rows=0 loops=1) Filter: ((NOT is_trashed) AND ((name)::text ~~ 'dict_to_interval_list.cwl dependencies%'::text) AND ((portable_data_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e+0'::text)) Rows Removed by Filter: 23412631 -> Materialize (cost=0.00..1.48 rows=1 width=28) (never executed) -> Seq Scan on materialized_permission_view (cost=0.00..1.48 rows=1 width=28) (never executed) Filter: (trashed = 1) Planning time: 4.833 ms Execution time: 272684.518 ms (11 rows)
Table "public.collections" Column | Type | Modifiers ------------------------------+-----------------------------+---------------------------------------------------------- id | integer | not null default nextval('collections_id_seq'::regclass) owner_uuid | character varying(255) | created_at | timestamp without time zone | not null modified_by_client_uuid | character varying(255) | modified_by_user_uuid | character varying(255) | modified_at | timestamp without time zone | portable_data_hash | character varying(255) | replication_desired | integer | replication_confirmed_at | timestamp without time zone | replication_confirmed | integer | updated_at | timestamp without time zone | not null uuid | character varying(255) | manifest_text | text | name | character varying(255) | description | character varying(524288) | properties | jsonb | delete_at | timestamp without time zone | file_names | character varying(8192) | trash_at | timestamp without time zone | is_trashed | boolean | not null default false storage_classes_desired | jsonb | default '["default"]'::jsonb storage_classes_confirmed | jsonb | default '[]'::jsonb storage_classes_confirmed_at | timestamp without time zone | Indexes: "collections_pkey" PRIMARY KEY, btree (id) "index_collections_on_owner_uuid_and_name" UNIQUE, btree (owner_uuid, name) WHERE is_trashed = false "index_collections_on_uuid" UNIQUE, btree (uuid) "collection_index_on_properties" gin (properties) "collections_full_text_search_idx" gin (to_tsvector('english'::regconfig, (((((((((((((((COALESCE(owner_uuid, ''::character varying)::text || ' '::text) || COALESCE(modified_by_client_uuid, ''::character varying)::text) || ' '::text) || COALESCE(modified_by_user_uuid, ''::character varying)::text) || ' '::text) || COALESCE(portable_data_hash, ''::character varying)::text) || ' '::text) || COALESCE(uuid, ''::character varying)::text) || ' '::text) || COALESCE(name, ''::character varying)::text) || ' '::text) || COALESCE(description, ''::character varying)::text) || ' '::text) || COALESCE(properties::text, ''::text)) || ' '::text) || COALESCE(file_names, ''::character varying)::text)) "collections_search_index" btree (owner_uuid, modified_by_client_uuid, modified_by_user_uuid, portable_data_hash, uuid, name) "index_collections_on_created_at" btree (created_at) "index_collections_on_delete_at" btree (delete_at) "index_collections_on_is_trashed" btree (is_trashed) "index_collections_on_modified_at" btree (modified_at) "index_collections_on_modified_at_uuid" btree (modified_at DESC, uuid) "index_collections_on_name" btree (name) "index_collections_on_owner_uuid" btree (owner_uuid) "index_collections_on_portable_data_hash" btree (portable_data_hash) "index_collections_on_portable_data_hash_is_trashed" btree (portable_data_hash, is_trashed) "index_collections_on_portable_data_hash_trash_at_desc" btree (portable_data_hash, trash_at DESC) "index_collections_on_trash_at" btree (trash_at)
Updated by Joshua Randall over 6 years ago
arvados_api_production=# select count(*) from collections where portable_data_hash = 'd41d8cd98f00b204e9800998ecf8427e+0'; count -------- 299122 (1 row)
Updated by Joshua Randall over 6 years ago
This appears to be a "low limit" problem for the postgres query planner.
Changing the limit from 1 to 100 makes the query hundreds of times faster:
arvados_api_production=# explain analyze SELECT collections."uuid", collections."owner_uuid", collections."created_at", collections."modified_by_client_uuid", collections."modified_by_user_uuid", collections."modified_at", collections."name", collections."description", collections."properties", collections."portable_data_hash", collections."replication_desired", collections."replication_confirmed", collections."replication_confirmed_at", collections."storage_classes_desired", collections."storage_classes_confirmed", collections."storage_classes_confirmed_at", collections."delete_at", collections."trash_at", collections."is_trashed" FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND ((collections.portable_data_hash = 'd41d8cd98f00b204e9800998ecf8427e+0') AND (collections.name like 'dict_to_interval_list.cwl dependencies%')) ORDER BY collections.modified_at desc, collections.uuid LIMIT 100 OFFSET 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=376075.80..376075.85 rows=20 width=892) (actual time=1325.066..1325.066 rows=0 loops=1) -> Sort (cost=376075.80..376075.85 rows=20 width=892) (actual time=1325.064..1325.064 rows=0 loops=1) Sort Key: collections.modified_at DESC, collections.uuid Sort Method: quicksort Memory: 25kB -> Nested Loop Anti Join (cost=5561.38..376075.37 rows=20 width=892) (actual time=1325.054..1325.054 rows=0 loops=1) Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text) -> Bitmap Heap Scan on collections (cost=5561.38..376073.55 rows=23 width=892) (actual time=1325.053..1325.053 rows=0 loops=1) Recheck Cond: ((portable_data_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e+0'::text) Filter: ((NOT is_trashed) AND ((name)::text ~~ 'dict_to_interval_list.cwl dependencies%'::text)) Rows Removed by Filter: 103041 Heap Blocks: exact=55865 -> Bitmap Index Scan on index_collections_on_portable_data_hash_is_trashed (cost=0.00..5561.37 rows=227481 width=0) (actual time=42.050..42.050 rows=103041 loops=1) Index Cond: (((portable_data_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e+0'::text) AND (is_trashed = false)) -> Materialize (cost=0.00..1.48 rows=1 width=28) (never executed) -> Seq Scan on materialized_permission_view (cost=0.00..1.48 rows=1 width=28) (never executed) Filter: (trashed = 1) Planning time: 0.646 ms Execution time: 1325.147 ms (18 rows)
This uses the totally sane choice of index: `index_collections_on_portable_data_hash_is_trashed` whereas the `LIMIT 1` query continues to use an unrelated index `index_collections_on_modified_at_uuid` which it ends up having to scan the entirety of.
arvados_api_production=# explain analyze SELECT collections."uuid", collections."owner_uuid", collections."created_at", collections."modified_by_client_uuid", collections."modified_by_user_uuid", collections."modified_at", collections."name", collections."description", collections."properties", collections."portable_data_hash", collections."replication_desired", collections."replication_confirmed", collections."replication_confirmed_at", collections."storage_classes_desired", collections."storage_classes_confirmed", collections."storage_classes_confirmed_at", collections."delete_at", collections."trash_at", collections."is_trashed" FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND ((collections.portable_data_hash = 'd41d8cd98f00b204e9800998ecf8427e+0') AND (collections.name like 'dict_to_interval_list.cwl dependencies%')) ORDER BY collections.modified_at desc, collections.uuid LIMIT 1 OFFSET 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..209440.48 rows=1 width=892) (actual time=232428.369..232428.369 rows=0 loops=1) -> Nested Loop Anti Join (cost=0.56..4188798.86 rows=20 width=892) (actual time=232428.366..232428.366 rows=0 loops=1) Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text) -> Index Scan using index_collections_on_modified_at_uuid on collections (cost=0.56..4188797.04 rows=23 width=892) (actual time=232428.365..232428.365 rows=0 loops=1) Filter: ((NOT is_trashed) AND ((name)::text ~~ 'dict_to_interval_list.cwl dependencies%'::text) AND ((portable_data_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e+0'::text)) Rows Removed by Filter: 23412797 -> Materialize (cost=0.00..1.48 rows=1 width=28) (never executed) -> Seq Scan on materialized_permission_view (cost=0.00..1.48 rows=1 width=28) (never executed) Filter: (trashed = 1) Planning time: 0.758 ms Execution time: 232428.447 ms (11 rows)
Updated by Joshua Randall over 6 years ago
This query can be made much faster by adding an extraneous column to the order by clause:
arvados_api_production=# explain analyze SELECT collections."uuid", collections."owner_uuid", collections."created_at", collections."modified_by_client_uuid", collections."modified_by_user_uuid", collections."modified_at", collections."name", collections."description", collections."properties", collections."portable_data_hash", collections."replication_desired", collections."replication_confirmed", collections."replication_confirmed_at", collections."storage_classes_desired", collections."storage_classes_confirmed", collections."storage_classes_confirmed_at", collections."delete_at", collections."trash_at", collections."is_trashed" FROM "collections" WHERE (NOT EXISTS(SELECT 1 FROM materialized_permission_view WHERE trashed = 1 AND (collections.owner_uuid = target_uuid)) AND collections.is_trashed = false) AND ((collections.portable_data_hash = 'd41d8cd98f00b204e9800998ecf8427e+0') AND (collections.name like 'dict_to_interval_list.cwl dependencies%')) ORDER BY collections.modified_at desc, collections.uuid, collections.replication_confirmed_at LIMIT 1 OFFSET 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=337792.41..337792.41 rows=1 width=893) (actual time=4646.366..4646.366 rows=0 loops=1) -> Sort (cost=337792.41..337792.46 rows=18 width=893) (actual time=4646.365..4646.365 rows=0 loops=1) Sort Key: collections.modified_at DESC, collections.uuid, collections.replication_confirmed_at Sort Method: quicksort Memory: 25kB -> Nested Loop Anti Join (cost=4916.75..337792.32 rows=18 width=893) (actual time=4646.310..4646.310 rows=0 loops=1) Join Filter: ((collections.owner_uuid)::text = (materialized_permission_view.target_uuid)::text) -> Bitmap Heap Scan on collections (cost=4916.75..337790.54 rows=20 width=893) (actual time=4646.309..4646.309 rows=0 loops=1) Recheck Cond: ((portable_data_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e+0'::text) Filter: ((NOT is_trashed) AND ((name)::text ~~ 'dict_to_interval_list.cwl dependencies%'::text)) Rows Removed by Filter: 103041 Heap Blocks: exact=55865 -> Bitmap Index Scan on index_collections_on_portable_data_hash_is_trashed (cost=0.00..4916.74 rows=201218 width=0) (actual time=121.372..121.372 rows=103041 loops=1) Index Cond: (((portable_data_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e+0'::text) AND (is_trashed = false)) -> Materialize (cost=0.00..1.48 rows=1 width=28) (never executed) -> Seq Scan on materialized_permission_view (cost=0.00..1.48 rows=1 width=28) (never executed) Filter: (trashed = 1) Planning time: 4.644 ms Execution time: 4646.433 ms (18 rows)
Unfortunately, the `uuid` column which normally does this cannot fulfill this role here as there is an combined index on `modified_at` and `uuid`.
Updated by Joshua Randall over 6 years ago
N.B. this query returns no rows because although there are nearly 300k collections with the empty portable_data_hash, none have the name pattern given.
Updated by Joshua Randall over 6 years ago
I think as long as the `name like '<name>%'` predicate is in the query, it is going to end up doing a scan of either an index or the table.
So, the real question here is who is doing this and can it be stopped?
The answer appears to be arvados-cwl-runner: https://github.com/wtsi-hgi/arvados/blob/688319dca9cbbb27452a0062943bb38b6935254e/sdk/cwl/arvados_cwl/runner.py#L332
There are at least two issues with that a-c-r function:
- the f.write() return code is not checked, so the function continues to run after failing to write the `workflow.cwl` data to keep (this is why we have hundreds of thousands of empty collections saved)
- the inclusion of the `["name", "like", name+"%"]` filter seems totally extraneous - surely we do not care what the collection is called - if it has the contents we want, we can use it, right? Changing the name `like` predicate to an `=` predicate makes the list db query 120000x faster and dropping the name predicate makes the db query 150x faster.
Updated by Joshua Randall over 6 years ago
Hmm, it seems like the exit() called when the exiting the write context should throw an exception from f.close() if the commit fails to write to keep, but somehow we are winding up with queries for an empty collection.
I think I understand the reasoning for the filter parameter on name - is it because writable collections can change their portable_data_hash and we want to be sure there is a persistent collection that "belongs" to us that has the desired portable_data_hash?
If that is the case, I still do not see why we need to use `like` - can we just replace this scheme with one named after the contents of the collection?
I.e. rather than calling `collection.save_new()` with `ensure_unique_name=True` (which I assume does some integer autoincrement nonsense to avoid name collisions) we could just calculate the md5sum of `json.dumps(packed, indent=2, sort_keys=True, separators=(',',': '))` and name the collection `workflow.cwl <md5sum>`???
Updated by Joshua Randall over 6 years ago
@tetron pointed out in IRC that the code we referenced is only being called when using RunInSingleContainer, which we are not using. He pointed us to the real place where our query is likely coming from: https://github.com/wtsi-hgi/arvados/blob/688319dca9cbbb27452a0062943bb38b6935254e/sdk/python/arvados/commands/run.py#L212
Updated by Peter Amstutz over 6 years ago
This is the "test if a similar collection exists" query and so you can remove the "name like" part of the query means the definition of "similar" is a bit broader (but still includes portable_data_hash which is the only thing that actually matters for correctness.)
arvados/sdk/python/commands/run.py:211
if name: filters.append(["name", "like", name+"%"])
arvados/sdk/cwl/arvados_cwl/runner.py:338
filters = [["portable_data_hash", "=", collection.portable_data_hash()], ["name", "like", name+"%"]]
Updated by Peter Amstutz over 6 years ago
From IRC:
Suggested solution is
- Tack on the PDH of the collection (after everything has been written) to the end of the "name"
- Do an exact match on name
So this would become
[["name", "=", "dict_to_interval_list.cwl dependencies d41d8cd98f00b204e9800998ecf8427e+0"]]
Updated by Peter Amstutz over 6 years ago
Separately it seems like it is also creating unnecessary empty collections, so there might be a simplification there as well.
Updated by Joshua Randall over 6 years ago
It actually looks like it is not creating the empty collections because of the condition on https://github.com/curoverse/arvados/blob/ad5f111b67e2bf1c43ce9bc37e6b11b8c753f62d/sdk/python/arvados/commands/run.py#L222:
elif len(collection) > 0: collection.save_new(name=name, owner_uuid=project, ensure_unique_name=True) pdh = collection.portable_data_hash() logger.info("Uploaded to %s (%s)", pdh, collection.manifest_locator())
So, a-c-r is doing the expensive (at least on our system) query to see if the collection exists, but then does not actually create it if it doesn't exist, meaning that all of these queries are always the worst case scenario (they would be a lot faster if they had something to find, but they never ever do).
I'll implement a hotfix which moves this test up to line 210 and adds an else block that sets the pdh to the empty collection pdh (`pdh = collection.portable_data_hash()`) when there are no files in the dependencies collection.
Updated by Joshua Randall over 6 years ago
This fix appears to be working for us: https://github.com/curoverse/arvados/pull/78
N.B. this is based on a commit from June which is what we are running in production.
Updated by Tom Morris over 6 years ago
- Status changed from New to Closed
- Assigned To set to Joshua Randall
- Target version set to 2018-09-19 Sprint
The fix for this was merged in 5b8dfb5d5f4b34fa730e69c45c517f0cfb3fdc9f