Project

General

Profile

Idea #3036

Updated by Tom Clegg almost 10 years ago

Background: 
 * Collections have uuid = hash(manifest_text) and are immutable, unlike other objects, which have uuid=prefix-random. This makes it possible to do a bitwise comparison of collections (e.g., job outputs) without even looking up the collection itself. 
 * Users can attach names to collections (like filenames in regular filesystems) by creating Link objects with link_class="name". This API is unwieldy. 

 New behavior: 
 * Collections are mutable, and have a name attribute 
 * Look up the hash of a collection's manifest when you want to do a bitwise comparison of content 

 (Certainly incomplete) list of changes/consequences: 
 * First step: allow clients to call collections.create without providing a uuid. 
 * Update uuid→class regexps to accept collection uuids in the usual arvados uuid format as well as portable_data_hashes. 
 * Copy current uuid values to portable_data_hash 
 * If clients provide portable_data_hash to collections.create, verify that as uuid is verified now (i.e., compare it to the portable_data_hash computed from the provided (stripped) manifest, and respond 422 if it doesn't match). Skip this check if no portable_data_hash provided by client. 
 * Fix clients so they pass the expected portable_data_hash instead of uuid (or pass neither) and use the uuid provided by Arvados, rather than assuming the new collection's uuid will be a content address. 
 * Add usual mutable fields like "name", "description", and "properties" to the collections table. 
 * Remove "all collections are owned by root" logic. 
 * Remove "add a permission link for me after creating a collection" logic. 
 * Update Workbench to use collections' "name" attributes instead of name links. 
 * Migrate existing name links in the database to become new collections. 

 Unresolved: 
 * Existing workbench links -- and repeating old jobs -- with old collection UUIDs should still work 
 ** Look up by @portable_data_hash@ if collections.get called with old format collection UUID? (Redact the mutable fields?) 
 * Collection content still immutable? Assign @uuid=hash(random+manifest)@ and keep @random@ in the collection record, so integrity can still be verified by a client that remembers only the UUID? 
 ** Alternative: record both @uuid@ and @portable_data_hash@ whenever referencing collections in job inputs, etc. 

Back