Bug #21205
closedensure_unique_name logic is not robust enough
Description
User reported "duplicate key value violates unique constraint index_collections_on_owner_and_name".
This error occurred from arvados-cwl-runner creating a collection called "Intermediate collection".
This case does supply the ensure_unique_name flag.
However, it seems that between running 100s of workflows, and this particular step doing a 24-way scatter, that it attempting to create ~10 collections per second (see screenshot), and that, while < 1% chance, it is evidently possible for two or more collections to be created at the same time and for save_with_unique_name! to fail enough times in a row that one of those collection create operations fails with a uniqueness error.
From discussion:
Instead of adding a timestamp, we should add ~10 random alphanumeric digits (similar to the logic used for UUIDs). As demonstrated, timestamps are prone to collision, and using millisecond precision that means we are limited to creating 1000 collections per second.
Files
Updated by Peter Amstutz about 1 year ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 1 year ago
- Status changed from In Progress to New
- Category set to API
Updated by Peter Amstutz about 1 year ago
- Description updated (diff)
- File Screenshot from 2023-11-20 10-37-11.png Screenshot from 2023-11-20 10-37-11.png added
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-29 sprint to Development 2024-01-03 sprint
Updated by Peter Amstutz about 1 year ago
21205-ensure-unique @ ceb50311bd12b48985e8212921d914fa3e8051a0
Updated by Peter Amstutz about 1 year ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 1 year ago
21205-ensure-unique @ c3872e7a1c817cd39b702f694f70d34f28f7f472
- All agreed upon points are implemented / addressed.
- uses random characters instead of a timestamp to make the names unique
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described
- existing tests passing
- Documentation has been updated.
- n/a
- Behaves appropriately at the intended scale (describe intended scale).
- should help at large scale as timestamp collisions are no longer possible
- Considered backwards and forwards compatibility issues between client and server.
- the number of characters replaced at the end are less than before, so the
arvados.util.trim_name
method is still valid (it just trims a bit more than it needs to).
- the number of characters replaced at the end are less than before, so the
- Follows our coding standards and GUI style guidelines.
- yes
When creating a new record, this should always succeed -- in the unlikely event that a record already existed with the same uuid, it would roll a new uuid.
When un-trashing a record, it could still fail if there was another record that already had the same name as what the ensure_unique_name
logic assigns (this would involve adding the last 15 characters of uuid for the record that is going to be un-trashed). However, I cannot think of any normal usage scenario where this would come up, it would only happen under a contrived circumstance designed to cause a collision.
Updated by Lucas Di Pentima about 1 year ago
Updated by Lucas Di Pentima about 1 year ago
- File
services/api/app/models/arvados_model.rb
- Line 506: "the uuid caused a name collision" comment makes me think that
save_with_unique_name!
is intended to include the uuid's last chars on the name by default and not as a fallback behavior, butadd_uuid_to_make_unique
is only called on the fallback code block, so not sure which is the valid one. Personally I think that adding the last UUID chars by default is the best option for performance reasons. - If the above was indeed meant to include the UUID chars by default, I think we need a specific test exercising that new behavior.
- Line 506: "the uuid caused a name collision" comment makes me think that
- File
services/api/test/unit/container_request_test.rb
- Line 1134: "...plus the date" comment is now outdated
The rest LGTM.
Updated by Peter Amstutz about 1 year ago
Lucas Di Pentima wrote in #note-12:
- File
services/api/app/models/arvados_model.rb
- Line 506: "the uuid caused a name collision" comment makes me think that
save_with_unique_name!
is intended to include the uuid's last chars on the name by default and not as a fallback behavior, butadd_uuid_to_make_unique
is only called on the fallback code block, so not sure which is the valid one. Personally I think that adding the last UUID chars by default is the best option for performance reasons.- If the above was indeed meant to include the UUID chars by default, I think we need a specific test exercising that new behavior.
The code retains the preexisting behavior, which only changes the name when there is a collision. You're right, in situations where we're creating a lot of collections using the same base name, it would avoid going through the fail-and-retry every time.
On the other hand, in situations where a collision is unlikely (but we want to ensure the collection creation succeeds even when there is a collision), it would still always add extra text to the end of the name, which might not be desirable. It is a tradeoff. One solution would be to add a new option with the behavior that it always adds the extra text -- but at the moment we don't have any indication that the fail-and-retry logic is actually a performance problem.
- File
services/api/test/unit/container_request_test.rb
- Line 1134: "...plus the date" comment is now outdated
Fixed.
The rest LGTM.
Thanks.
Updated by Peter Amstutz about 1 year ago
- Status changed from In Progress to Resolved