Bug #9831
closed[API] Fix ensure_unique_name so finding the right N for "original name (N)" does not take N database queries
Description
Current code uses 6 seconds of database time to save this tiny collection:
{"method":"POST","path":"/arvados/v1/collections","format":"*/*","controller":"arvados/v1/collections","action":"create","status":200,"duration":13288.24,"view":0.29,"db":6257.93,"params":{"collection":"{\"owner_uuid\":\"4xphq-tpzed-8i6n6lswodedghy\",\"name\":\"diagnostics hash output\",\"portable_data_hash\":\"147249c1018bc6803b7b0fe26050b558+54\",\"manifest_text\":\". 2e4088cf8ddfd7eed5700fd1e9c946ae+87+A4049ccbde97ae393d6fa984031c8c3dc98feda78@57cdb846 0:87:md5sum.txt\\n\"}","ensure_unique_name":"true"},"@timestamp":"2016-08-22T18:24:20Z","@version":"1","message":"[200] POST /arvados/v1/collections (arvados/v1/collections#create)"}
Proposed solution¶
If the provided name is in use:- Use "name (timestamp)" where timestamp is rfc3339 with milliseconds.
- If the chosen name is already taken due to a race with a concurrent request, try again, but give up after 10 attempts.
Updated by Tom Morris about 8 years ago
Total time is actually more like 15+ seconds:
15291.06 5773.91 POST create /arvados/v1/collections {"manifest_text":"","name":"New collection","ensure_unique_name":"true","alt":"json","collection":{"manifest_text":"","name":"New collection"}} null
first column is total request time and second is database time.
Perhaps this is blindingly obvious, but why do collection names need to be unique per user anyway? What happens when a collection is transferred to a different user where there's a name collision?
Updated by Tom Morris about 8 years ago
Tom Morris wrote:
why do collection names need to be unique per user anyway? What happens when a collection is transferred to a different user where there's a name collision?
Transferring Tom's answer: Because otherwise we wouldn't have unique names in the directory hierarchy for the FUSE driver. Note that the "owner" in this case is the owning project, not the user, so there's a misconception in the formulation of my original question.
Updated by Tom Morris almost 8 years ago
- Assigned To set to Tom Morris
- Target version set to Arvados Future Sprints
I've seen collection serial #s as high as "New collection (8666)"
Updated by Tom Clegg almost 8 years ago
- Category set to API
- Status changed from New to In Progress
- Assigned To changed from Tom Morris to Tom Clegg
- Target version changed from Arvados Future Sprints to 2017-01-18 sprint
- Story points set to 0.5
Updated by Tom Clegg almost 8 years ago
9831-faster-unique-name @ daafdb4c939f265b4604711d0fc946a830d9d54e https://ci.curoverse.com/job/developer-run-tests/134/
Updated by Radhika Chippada almost 8 years ago
@ daafdb4c
Since we are using precision of 3 digits in ms for name, I am wondering if the system is so fast that we might end up doing too many reties or running out of retries when a name collision does occur? Would sleeping for 0.1s or something in between retries be desirable?
LGTM otherwise. Thanks.
Updated by Tom Clegg almost 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:49db3b740b861688eff2a872c8f69f65ee893ed2.