Bug #9831

[API] Fix ensure_unique_name so finding the right N for "original name (N)" does not take N database queries

Added by Tom Clegg over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
01/12/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

Current code uses 6 seconds of database time to save this tiny collection:

{"method":"POST","path":"/arvados/v1/collections","format":"*/*","controller":"arvados/v1/collections","action":"create","status":200,"duration":13288.24,"view":0.29,"db":6257.93,"params":{"collection":"{\"owner_uuid\":\"4xphq-tpzed-8i6n6lswodedghy\",\"name\":\"diagnostics hash output\",\"portable_data_hash\":\"147249c1018bc6803b7b0fe26050b558+54\",\"manifest_text\":\". 2e4088cf8ddfd7eed5700fd1e9c946ae+87+A4049ccbde97ae393d6fa984031c8c3dc98feda78@57cdb846 0:87:md5sum.txt\\n\"}","ensure_unique_name":"true"},"@timestamp":"2016-08-22T18:24:20Z","@version":"1","message":"[200] POST /arvados/v1/collections (arvados/v1/collections#create)"}

Proposed solution

If the provided name is in use:
  • Use "name (timestamp)" where timestamp is rfc3339 with milliseconds.
  • If the chosen name is already taken due to a race with a concurrent request, try again, but give up after 10 attempts.

Subtasks

Task #10873: Review 9831-faster-unique-nameResolvedRadhika Chippada

Associated revisions

Revision bab78d47
Added by Radhika Chippada over 4 years ago

refs #9831
Merge branch '9831-fix-failing-workbench-test'

Revision 49db3b74
Added by Tom Clegg over 4 years ago

Merge branch '9831-faster-unique-name'

closes #9831

History

#1 Updated by Tom Clegg over 4 years ago

  • Description updated (diff)

#2 Updated by Tom Morris over 4 years ago

Total time is actually more like 15+ seconds:

15291.06 5773.91 POST create /arvados/v1/collections {"manifest_text":"","name":"New collection","ensure_unique_name":"true","alt":"json","collection":{"manifest_text":"","name":"New collection"}} null

first column is total request time and second is database time.

Perhaps this is blindingly obvious, but why do collection names need to be unique per user anyway? What happens when a collection is transferred to a different user where there's a name collision?

#3 Updated by Tom Morris over 4 years ago

Tom Morris wrote:

why do collection names need to be unique per user anyway? What happens when a collection is transferred to a different user where there's a name collision?

Transferring Tom's answer: Because otherwise we wouldn't have unique names in the directory hierarchy for the FUSE driver. Note that the "owner" in this case is the owning project, not the user, so there's a misconception in the formulation of my original question.

#4 Updated by Tom Morris over 4 years ago

  • Assigned To set to Tom Morris
  • Target version set to Arvados Future Sprints

I've seen collection serial #s as high as "New collection (8666)"

#5 Updated by Tom Clegg over 4 years ago

  • Category set to API
  • Status changed from New to In Progress
  • Assigned To changed from Tom Morris to Tom Clegg
  • Target version changed from Arvados Future Sprints to 2017-01-18 sprint
  • Story points set to 0.5

#6 Updated by Tom Clegg over 4 years ago

  • Description updated (diff)

#7 Updated by Tom Clegg over 4 years ago

  • Description updated (diff)

#9 Updated by Radhika Chippada over 4 years ago

@ daafdb4c

Since we are using precision of 3 digits in ms for name, I am wondering if the system is so fast that we might end up doing too many reties or running out of retries when a name collision does occur? Would sleeping for 0.1s or something in between retries be desirable?

LGTM otherwise. Thanks.

#10 Updated by Tom Clegg over 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:49db3b740b861688eff2a872c8f69f65ee893ed2.

Also available in: Atom PDF