Feature #13109

Updated by Lucas Di Pentima about 3 years ago

See [[Collection version history]]

For some types of collections, particularly things like reference data, it is desirable to keep old versions around if they are updated.

h2. User-facing features

* A
Q: *One possible implementation is copy-on-write where a new copy of the object is made if certain important attributes change (only manifest?). The new collection has should have a current version number, link to the source that it is cloned from.*
A: When some important attribute changes, before saving it to the database, clone the collection
so that it’ll have the pair (uuid, version_nr) is same PDH but a different UUID, and this will be enough treated as a reference to a specific version an ancestor of a particular the “source” collection.
* Whenever a
The source collection will then receive the updates, and also get its @manifest_text@, @description@, @properties@ or @name@ fields updated, a ancestor field updated with the cloned collection’s UUID.

Q: *Do UUIDs move to
new version collection? If they don't, it might mess up things like: FUSE mount, Keep Web might*
A: Depending on what we call “new collection”. When an existing collection
is created (a 'snapshot' of the collection-to-be-changed record is created pointing to the getting updated, most current version.)
* The user can request
a collection via an API call that includes past versions.
“new collection” will be created as a snapshot, keeping its PDH for reference purposes, and getting a new UUID. The user can search on collections including past versions.
* Whenever a
original collection changes owners, uuid, storage classes, replication levels record will then be updated, and trashed status, because of that, it’s PDH will change but its past versions follow it. UUID will be the same. FUSE & keep web will continue to work as they’re working nowadays.

Q: *Do we need the ability to lock collections?*

* In order A: To avoid concurrency issues (2 updates at the same time), we might need to modify a past version, wrap the user needs to copy it cloning-and-updating operation into a new collection. transaction.

h3. On workbench

* A new 'History' tab show
Q: *What is the currently viewed interaction between versions and expiration dates?*
A: Previous versions shouldn’t be expiring before the latest one, so if a
collection position with trash_at date set gets updated and later on a list of versions.
* The collection's main pane show
this attribute changes, previous versions should be synced. If this is the case, we might need to add an indication if its additional field to link to the current version or an old one.
* When viewing a past version, if data blocks aren't retained, disable view/download buttons
“next version” so that we can quickly validate that trash_at can only be updated by the user on the File tab, latest version. This would also indicating facilitate back & forth version navigation.

Q: *What does
the data blocks aren't available. UI for this look like? Do previous versions get filtered by default?*
* 'Copy to project...' button A: API-wise we should also be disabled when data blocks aren't retained.
* New button 'Expunge data' on old
filtering the previous versions from listings and also making them unmutable. We could add a new api response field with an ordered list of the previous versions uuids, or if that’s too expensive, a confirmation dialog explaining what's about to happen. separate endpoint.

h2. System wide configurations

* Flag to enable version history retention (OFF by default)
Q: *Do all versions get trashed & moved as a group?*
* Flag to enable data blocks retention on old versions. (ON by default?)
* Flag to allow users to drop data blocks retention on old
A: For simplicity’s sake, I think we should be treating previous versions as part of a collection.

h2. Implementation details

* All past versions go
the latest collection, we could later on add the same new behavior taking into consideration what happens if the version chain is broken because some collections table (so it's easier to do paging)
* New column @current_version_uuid@ to hold the current version's UUID.
* New column @version_number@ to hold a consecutive integer, starting at 1 for new collections.
* New boolean column @data_blocks_retained@ as a hint for @keep-balance@ to keep/delete old version's blocks. Can be updated only from True to False?
* The following fields
are synced with their past versions counterparts: replication_*, storage_classes_*, trash_at/delete_at/is_trashed, owner_uuid, uuid (update current_uuid to retain database consistency)
* Old versions with the same name shouldn't conflict with each other or other collections.