Project

General

Profile

Actions

Idea #3699

closed

[SDKs] Copy a pipeline instance, along with its input and output data, from one arvados instance to another

Added by Tom Clegg over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tim Pierce
Category:
SDKs
Target version:
Start date:
08/29/2014
Due date:
Story points:
1.0

Description

Use case: user can copy a pipeline instance between Arvados instances, in order to rerun a pipeline on another cluster and compare results with the original computation. Example:

  1. User runs arv-copy 1h9kt-pipeline-uuid 1h9kt 4xphq to copy instance 1h9kt-pipeline-uuid to cluster 4xphq
  2. User views the new pipeline instance on 4xphq's workbench
  3. User clicks "run" on the copied pipeline template page (selecting an appropriate input collection, probably the input collection that was copied along with the pipeline instance and template)
  4. Jobs run.
  5. User uses "compare pipelines" on 4xphq to compare the original, copied 1h9kt pipeline instance with the new 4xphq instance that was just generated.

Syntax:

$ arv-copy [--recursive=true/false] [pipeline-instance-uuid] [source-arvados] [destination-arvados]

By default, arv-copy exports the specified pipeline instance from the source-arvados instance and imports it to destination-arvados. arv-copy makes the following changes to the pipeline instance before importing it:
  • renames uuid to properties.copied_from_pipeline_instance_uuid
  • removes owner_uuid
The --recursive option, which defaults to true, also copies the following data:
  • collections (copy blocks and then copy manifest_text)
    • Finding collections to copy: For each component in pipeline_instance.components(), append component.job().dependencies()
  • docker images (collection copy + docker specific tags)
    • Copy docker images identified by collection hash in docker_image_locator
  • pipeline templates (copy name, components)
  • git repository (clone entire repository; update name of repository to use in components of target pipeline template)

If --recursive=false, copy only the pipeline instance, but emit a warning that the user will have to fix the pipeline template UUID by hand.

arv-copy returns an error if pipeline-instance-uuid refers to an object that cannot be copied between instances. For this story, arv-copy is only guaranteed to work on pipeline instance UUIDs. Future stories may expand this feature.

A warning is issued if arv-copy is asked to copy a pipeline instance in which:
  • one or more components includes runtime_dependencies with a docker_image field, which is a symbolic name for a Docker image
  • one or more components uses symbolic names for git revisions (e.g. a branch name, "master", etc)

For copying git commits, it is critical that we preserve the commit hashes between repositories, which means copying the commit history. This stackoverflow will probably provide important guidance: http://stackoverflow.com/questions/1365541/how-to-move-files-from-one-git-repo-to-another-not-a-clone-preserving-history.


Files

arv-copy-perf.png (25.8 KB) arv-copy-perf.png Peter Amstutz, 10/21/2014 04:06 PM

Subtasks 7 (0 open7 closed)

Task #3742: arv-copy works on collectionsResolvedTim Pierce09/03/2014Actions
Task #3744: arv-copy works on pipeline instancesResolvedTim Pierce09/04/2014Actions
Task #3743: arv-copy works on docker imagesResolvedTim Pierce09/04/2014Actions
Task #3758: arv-copy works on git reposResolvedTim Pierce09/09/2014Actions
Task #3838: Review 3699-arv-copyResolvedPeter Amstutz08/29/2014Actions
Task #3784: arv-copy works on pipeline templatesResolvedTim Pierce08/29/2014Actions
Task #3759: arv-copy authenticates to multiple Arvados instancesResolvedTim Pierce08/29/2014Actions
Actions

Also available in: Atom PDF