Story #3036

Updated by Tom Clegg over 7 years ago

Background:
* Collections have uuid = hash(manifest_text) and are immutable, unlike other objects, which have uuid=prefix-random. This makes it possible to do a bitwise comparison of collections (e.g., job outputs) without even looking up the collection itself.
* Users can attach names to collections (like filenames in regular filesystems) by creating Link objects with link_class="name". This API is unwieldy.

New behavior:
* Collections are mutable, and have a name attribute
* Look up the hash of a collection's manifest when you want to do a bitwise comparison of content

(Certainly incomplete) list of changes/consequences:
* First step: allow clients to call collections.create without providing a uuid.
* Update uuid→class regexps to accept collection uuids in the usual arvados uuid format as well as portable_data_hashes.
* Copy current uuid values to portable_data_hash
* If clients provide portable_data_hash to collections.create, verify that as uuid is verified now (i.e., compare it to the portable_data_hash computed from the provided (stripped) manifest, and respond 422 if it doesn't match). Skip this check if no portable_data_hash provided by client.
* Fix clients so they pass the expected portable_data_hash instead of uuid (or pass neither) and use the uuid provided by Arvados, rather than assuming the new collection's uuid will be a content address.
* Add usual mutable fields like "name", "description", and "properties" to the collections table.
* Remove "all collections are owned by root" logic.
* Remove "add a permission link for me after creating a collection" logic.
* Update Workbench to use collections' "name" attributes instead of name links.
* Migrate existing name links in the database to become new collections.

Unresolved:
* Existing workbench links -- and repeating old jobs -- with old collection UUIDs should still work
** Look up by @portable_data_hash@ if collections.get called with old format collection UUID? (Redact the mutable fields?)
* Collection content still immutable? Assign @uuid=hash(random+manifest)@ and keep @random@ in the collection record, so integrity can still be verified by a client that remembers only the UUID?
** Alternative: record both @uuid@ and @portable_data_hash@ whenever referencing collections in job inputs, etc.

Back