Idea #19592
Updated by Peter Amstutz 4 months ago
Vision: Arvados projects as "packages": a bundle of data and code to which a version can be assigned, and copies distributed far and wide to other Arvados instances. Users are able to track that they used a specific X.Y.Z version (also identified by immutable hash) of a package. A history of package versions is kept, and it must be possible to reference or go back to earlier versions, as well as determine what changed between two versions. Initial design thoughts: * Compute a hash for an snapshot entire project contents, including collections, subprojects, workflows, container requests, and containers projects ** Could be built on computing data hashes for records that cover the majority of the record contents, including metadata such as creation/last modified time. * Maintain assign a history of project versions content address * Copy a export project to another cluster and compute contents (including keep blocks) as a content hash that confirms that the content is the same * Determine what changed between two versions directory tree of a project plain files * Apply a set of changes that were made import to one copy of a project, to another copy cluster * Export the project to a file system hierarchy, and re-import the project later