Project

General

Profile

Actions

Idea #19592

open

Assigning a portable data hash to a project tree & project export/import

Added by Peter Amstutz about 2 years ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assigned To:
-
Target version:
-
Start date:
01/01/2025
Due date:
06/30/2025 (Due in about 6 months)
Story points:
-
Release:
Release relationship:
Auto

Description

Vision:

Arvados projects as "packages": a bundle of data and code to which a version can be assigned, and copies distributed far and wide to other Arvados instances. Users are able to track that they used a specific X.Y.Z version (also identified by immutable hash) of a package.

A history of package versions is kept, and it must be possible to reference or go back to earlier versions, as well as determine what changed between two versions.

Initial design thoughts:

  • Compute a hash for an entire project contents, including collections, subprojects, workflows, container requests, and containers
    • Could be built on computing data hashes for records that cover the majority of the record contents, including metadata such as creation/last modified time.
  • Maintain a history of project versions
  • Copy a project to another cluster and compute a content hash that confirms that the content is the same
  • Determine what changed between two versions of a project
  • Apply a set of changes that were made to one copy of a project, to another copy
  • Export the project to a file system hierarchy, and re-import the project later
Actions

Also available in: Atom PDF