Story #3699

Updated by Tim Pierce over 5 years ago

Use case: user can copy a pipeline instance between Arvados instances, in order to rerun a pipeline on another cluster and compare results with the original computation. Example:

# User runs @arv-copy 1h9kt-pipeline-uuid 1h9kt 4xphq@ to copy instance @1h9kt-pipeline-uuid@ to cluster 4xphq
# User views the new pipeline instance on 4xphq's workbench
# User clicks "run" on the copied pipeline template page (selecting an appropriate input collection, probably the input collection that was copied along with the pipeline instance and template)
# Jobs run.
# User uses "compare pipelines" on 4xphq to compare the original, copied 1h9kt pipeline instance with the new 4xphq instance that was just generated.

$ arv-copy [--recursive=true/false] [pipeline-instance-uuid] [source-arvados] [destination-arvados]

By default, arv-copy exports the specified pipeline instance from the _source-arvados_ instance and imports it to _destination-arvados_. arv-copy makes the following changes to the pipeline instance before importing it:
* renames @uuid@ to @properties.copied_from_pipeline_instance_uuid@
* removes @owner_uuid@

The @--recursive@ option, which defaults to true, also copies the following data:
* collections (copy blocks and then copy manifest_text)
** Finding collections to copy: For each component in pipeline_instance.components(), append component.job().dependencies()
* docker images (collection copy + docker specific tags)
** Copy docker images identified by collection hash in @docker_image_locator@
* pipeline templates (copy name, components)
* git repository (clone entire repository; update name of repository to use in components of target pipeline template)

If @--recursive=false@, copy only the pipeline instance, but emit a warning that the user will have to fix the pipeline template UUID by hand.

arv-copy returns an error if _pipeline-instance-uuid_ refers to an object that cannot be copied between instances. For this story, arv-copy is only guaranteed to work on pipeline instance UUIDs. Future stories may expand this feature.

A warning is issued if arv-copy is asked to copy a pipeline instance in which:
* one or more components includes runtime_dependencies with a docker_image field, which is a symbolic name for a Docker image
* one or more components uses symbolic names for git revisions (e.g. a branch name, "master", etc)

For copying git commits, it is critical that we preserve the commit hashes between repositories, which means copying the commit history. This stackoverflow will probably provide important guidance: