Bug #5288

[Documentation] Write user guide for arv-copy

Added by Brett Smith over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Bryan Cosca
Category:
Documentation
Target version:
Start date:
07/14/2015
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

Write a guide that walks a user through the process of starting from no configuration, and using arv-copy to copy a pipeline template from one cluster to another. This includes:

  • Work from a machine with an SSH key that's allowed to access both clusters (required to copy git repositories).
  • Set up configuration files in ~/.config/arvados/, named by cluster.
  • The actual invocation, emphasizing that --src and --dst arguments match filenames in ~/.config/arvados.
  • Highlight other interesting options that people are likely to want to use, like --project-uuid. (Maybe that's the only one? I don't feel 100% sure.)

Subtasks

Task #6286: Review branch: 5288-arv-copy-documentationClosedBryan Cosca


Related issues

Has duplicate Arvados - Bug #5294: [SDKs] Make arv-copy useful for copying a Docker imageClosed02/23/2015

Associated revisions

Revision 66c19e11
Added by bryan over 6 years ago

Merge branch '5288-arv-copy-documentation' refs #5288

History

#1 Updated by Brett Smith over 6 years ago

  • Target version changed from Arvados Future Sprints to 2015-07-08 sprint

#2 Updated by Brett Smith over 6 years ago

  • Story points set to 1.0

#3 Updated by Brett Smith over 6 years ago

  • Assigned To set to Brett Smith

#4 Updated by Tom Clegg over 6 years ago

Free suggestions:
  • a couple of worked examples, maybe the simplest case (collection) and the hairiest case (pipeline instance) are good ones to start?
  • help with configs
  • link to the doc page from arv-copy --help

#5 Updated by Brett Smith over 6 years ago

  • Description updated (diff)
  • Target version changed from 2015-07-08 sprint to 2015-07-22 sprint

#6 Updated by Brett Smith over 6 years ago

  • Assigned To deleted (Brett Smith)

#7 Updated by Brett Smith over 6 years ago

  • Assigned To set to Bryan Cosca
  • Story points changed from 1.0 to 2.0

#8 Updated by Brett Smith over 6 years ago

  • Description updated (diff)

#9 Updated by Bryan Cosca over 6 years ago

Radhika, this is ready for review.

I left the example very general, so in order to follow, you need to use your own files.

An example collection: https://cloud.curoverse.com/collections/qr1hi-4zz18-tci4vn4fa95w0zx
An example pipeline template: https://cloud.curoverse.com/pipeline_templates/qr1hi-p5p6p-9pkaxt6qjnkxhhu
An example pipeline instance: https://cloud.curoverse.com/pipeline_instances/qr1hi-d1hrv-nao0ohw8y7dpf84

I also think this is blocked by #6014 because repository creation is important to copying pipeline templates and instances.

#10 Updated by Bryan Cosca over 6 years ago

  • Status changed from New to In Progress

#11 Updated by Radhika Chippada over 6 years ago

Review feedback for 9d52ea83290cab293229815286579ab6a1584f9e

  • Title: “Arv-copy” -> should it be “arc-copy” given that it is a command and if used Arv-copy, does not actually work? Please change all “Arv-copy” occurrences with “arv-copy”
  • “allows users to copy collections, pipeline templates, and pipeline instances, including all their dependencies from one cluster to another”
    • Please include the recursive deep copy is the default
    • add an example (later on in the page) to copy just one item without copying the dependencies
  • It was quite confusing for me to understand what the config file name for “qr1hi” should be. Since the workbench URL is “https://cloud.curoverse.com/“, I could not tell what the file name should be. How can we make it clear to the user? Do you want to say the filename will be the “<uuid_prefix>.conf” and the uuid_prefix is found in the manage_account page (which is the first word in ARVADOS_API_HOST)?
  • Same issue with the "cluster1" and "cluster2" in the example commands. Is it the uuid prefix "qr1hi" or "cloud" or what I could not tell
  • Can you please use at least one real env? I think using qr1hi instead of cluster1 might help. In that case, you can make cluster2 as “dst_cluster” …
  • Can you please include real examples in the commands? Instead of “cluster1-4zz18-1234567890abcde” and “cluster1-p5p6p-abcd123efghi45jkl”, would you be able to include some public project items in these examples?
  • Please add a brief explanation as to what happens after copying. For example, after I arv-copy’ed a collection, the command displayed me the uuid of the collection generated. Please tell the user that this can be found in the user’s Home project.
  • I arv-copy'ed an instance (qr1hi-d1hrv-fom17oygoxj1j33) and this copied 3 collections, one pipeline instance and template each. And they all were found in the Home project. Can you please clarify this (probably by using a real instance with dependencies)?
  • Not sure about this: “This part of the tutorial assumes that you are working from a machine with an SSH key that is allowed to access both clusters (required to copy git repositories)”. What exactly needed here? I had the same ssh key added in both my qr1hi and 4xphq manage_account pages. However, I did not have anything in the ~/.ssh dir when I logged into my 4xphq shell account and what I had in qr1hi is different than what I added in manage_account page. And, I was able to arv-copy in this situation. So, I do not think this statement is quite comprehensive. Please clarify.
  • This statement: “As well as a git repository in the destination cluster”. Can you please say something like “this tutorial expects that you are working with the ‘tutorial’ repository created in ‘Adding a new arvados repository’ page”.
  • Please use “$USER/tutorial.git” repository in place of “samplegitrepo/samplename.git”
  • This statement is not clear for me: “New branches in the destination git repo will be created for each branch used in the pipeline template. For example, if your source branch was named branch_name, your new branch will be named git_git_cluster1_arvadosapi_com_reponame_git_branch_name.”
    • Would you be able to clarify what are the branches here? Is it possible to provide a real example by copying a real template?
    • Can you please replace the “reponame” here with “tutorial”, if that is what we are using in this tutorial.
  • From a shell terminal logged into qr1hi, I was able to use arv-copy when the src is qr1hi and dst is 4xphq, as well as when src is 4xphq and dst is qr1hi. Which is what is expected. Just wondering if this is something you would want to clarify that you can be logged into any of the shell accounts.
  • I think it does not hurt to include a manage_account page picture to clarify where to find the API_TOKEN and uuid_prefix etc.

Thanks.

#12 Updated by Bryan Cosca over 6 years ago

Radhika Chippada wrote:

Review feedback for 9d52ea83290cab293229815286579ab6a1584f9e

  • Title: “Arv-copy” -> should it be “arc-copy” given that it is a command and if used Arv-copy, does not actually work? Please change all “Arv-copy” occurrences with “arv-copy”

Done

  • “allows users to copy collections, pipeline templates, and pipeline instances, including all their dependencies from one cluster to another”
    • Please include the recursive deep copy is the default
    • add an example (later on in the page) to copy just one item without copying the dependencies

Done

  • It was quite confusing for me to understand what the config file name for “qr1hi” should be. Since the workbench URL is “https://cloud.curoverse.com/“, I could not tell what the file name should be. How can we make it clear to the user? Do you want to say the filename will be the “<uuid_prefix>.conf” and the uuid_prefix is found in the manage_account page (which is the first word in ARVADOS_API_HOST)?
  • Same issue with the "cluster1" and "cluster2" in the example commands. Is it the uuid prefix "qr1hi" or "cloud" or what I could not tell
  • Can you please use at least one real env? I think using qr1hi instead of cluster1 might help. In that case, you can make cluster2 as “dst_cluster” …

Done.

  • Can you please include real examples in the commands? Instead of “cluster1-4zz18-1234567890abcde” and “cluster1-p5p6p-abcd123efghi45jkl”, would you be able to include some public project items in these examples?

I'm using qr1hi-4zz18-tci4vn4fa95w0zx as an example.

  • Please add a brief explanation as to what happens after copying. For example, after I arv-copy’ed a collection, the command displayed me the uuid of the collection generated. Please tell the user that this can be found in the user’s Home project.

Done

  • I arv-copy'ed an instance (qr1hi-d1hrv-fom17oygoxj1j33) and this copied 3 collections, one pipeline instance and template each. And they all were found in the Home project. Can you please clarify this (probably by using a real instance with dependencies)?

I used qr1hi-d1hrv-nao0ohw8y7dpf84 and this is what happened:

bcosc@bcosc.qr1hi:~$ arv-copy --src qr1hi --dst su92l --dst-git-repo bcosc.htslib qr1hi-d1hrv-nao0ohw8y7dpf84
WARNING: 'arvados' is an alias for 'qr1hi-s0uqq-izl5prv9uxccky9'
remote: Counting objects: 69372, done.
remote: Compressing objects: 100% (28461/28461), done.
remote: Total 69372 (delta 49747), reused 52281 (delta 36941)
Receiving objects: 100% (69372/69372), 16.21 MiB | 18.60 MiB/s, done.
Resolving deltas: 100% (49747/49747), done.
Warning: the RSA host key for 'git.su92l.arvadosapi.com' differs from the key for the IP address '130.211.123.131'
Offending key for IP in /home/bcosc/.ssh/known_hosts:9
Matching host key in /home/bcosc/.ssh/known_hosts:11
Are you sure you want to continue connecting (yes/no)? yes
WARNING: 'bcosc.htslib' is an alias for 'su92l-s0uqq-078etwjufn7vko0'
Total 0 (delta 0), reused 0 (delta 0)
To git@git.su92l.arvadosapi.com:bcosc.htslib.git
 * [new branch]      git_git_qr1hi_arvadosapi_com_arvados_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d -> git_git_qr1hi_arvadosapi_com_arvados_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d
Warning: the RSA host key for 'git.su92l.arvadosapi.com' differs from the key for the IP address '130.211.123.131'
Offending key for IP in /home/bcosc/.ssh/known_hosts:9
Matching host key in /home/bcosc/.ssh/known_hosts:11
Are you sure you want to continue connecting (yes/no)? yes
WARNING: 'bcosc.htslib' is an alias for 'su92l-s0uqq-078etwjufn7vko0'
Total 0 (delta 0), reused 0 (delta 0)
To git@git.su92l.arvadosapi.com:bcosc.htslib.git
 * [new branch]      git_git_qr1hi_arvadosapi_com_arvados_git_462fbba4ab742a72a3cf057dc06610a51af6b0f0 -> git_git_qr1hi_arvadosapi_com_arvados_git_462fbba4ab742a72a3cf057dc06610a51af6b0f0
2015-07-14 17:45:06 arvados.arv-copy[19694] INFO: 
2015-07-14 17:45:06 arvados.arv-copy[19694] INFO: Success: created copy with uuid su92l-d1hrv-rym2h5ub9m8ofwj
  • Not sure about this: “This part of the tutorial assumes that you are working from a machine with an SSH key that is allowed to access both clusters (required to copy git repositories)”. What exactly needed here? I had the same ssh key added in both my qr1hi and 4xphq manage_account pages. However, I did not have anything in the ~/.ssh dir when I logged into my 4xphq shell account and what I had in qr1hi is different than what I added in manage_account page. And, I was able to arv-copy in this situation. So, I do not think this statement is quite comprehensive. Please clarify.

I think I made a mistake. In the description, Brett said that you need an SSH key to copy git repositories. I think you do not need this for pipeline instances/templates. I will add an example for copying git repositories.

Looks like you cannot copy repositories. I will just omit this sentence then.

arv-copy --src qr1hi --dst su92l qr1hi-s0uqq-13jl5o1hyobzhwb
2015-07-14 18:16:48 arvados.arv-copy30313 INFO: arv-copy: cannot copy object qr1hi-s0uqq-13jl5o1hyobzhwb of type Repository

  • This statement: “As well as a git repository in the destination cluster”. Can you please say something like “this tutorial expects that you are working with the ‘tutorial’ repository created in ‘Adding a new arvados repository’ page”.
  • Please use “$USER/tutorial.git” repository in place of “samplegitrepo/samplename.git”

Done.

  • This statement is not clear for me: “New branches in the destination git repo will be created for each branch used in the pipeline template. For example, if your source branch was named branch_name, your new branch will be named git_git_cluster1_arvadosapi_com_reponame_git_branch_name.”
    • Would you be able to clarify what are the branches here? Is it possible to provide a real example by copying a real template?
    • Can you please replace the “reponame” here with “tutorial”, if that is what we are using in this tutorial.

Will do.

  • From a shell terminal logged into qr1hi, I was able to use arv-copy when the src is qr1hi and dst is 4xphq, as well as when src is 4xphq and dst is qr1hi. Which is what is expected. Just wondering if this is something you would want to clarify that you can be logged into any of the shell accounts.

Will do.

  • I think it does not hurt to include a manage_account page picture to clarify where to find the API_TOKEN and uuid_prefix etc.

Done.

Thanks.

#13 Updated by Radhika Chippada over 6 years ago

  • lets -> let’s ?
  • “beta cloud instance qr1hi” Can we add the site url also? Some thing like “beta cloud instance qr1hi ( <a href="{{site.arvados_workbench_host}}/" target="_blank">{{site.arvados_workbench_host}}/</a> ) … ?
  • “You can find the cluster name from the prefix of the uuid of the object you want to copy”. Can we add “for example, in qr1hi-4zz18-tci4vn4fa95w0zx …” ?
  • “Copy your ARVADOS_API_HOST and ARVADOS_API_TOKEN into the config files as shown below.” These config files need to be added only in the shell account from which we are executing the commands right? For example: to copy from qr1hi to 4xphq by logging into qr1hi, I just added them in qr1hi shell terminal. Is it something that you might want to clarify?
  • “The uuid can be found inside of a collection in the top left box, or from the URL bar” -> Something like “The uuid can be found in the collection display page in the collection summary area (top left box) …”
  • “You can find this collection in the lobSTR v.3 project on qr1hi.” -> Can you please use the site url here instead of qr1hi?
  • “The output of arv-copy displays the uuid of the collection generated” -> “The output of arv-copy displays the uuid of the collection generated in the destination cluster” ?
  • Do you mind adding an example for “If you want to place your collection inside of a pre-created project, you can specify the project you want it to be in using the tag --project-uuid followed by the project uuid.” ? The project uuid can be xxxxx-j7d0g-yyyyy…
  • use the ‘tutorial’ repository -> Can you please bold the word tutorial “use the tutorial repository “
  • “To :bcosc.htslib.git” Can you please rename su92l with dst_cluster or something? Not sure if su92l is public?
  • “git_git_cluster1_arvadosapi_com_reponame_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d.” -> I think it should say “qr1hi” instead of cluster1?
  • Please add an example to copy just the object (non-recursive) for a pipeline instance (an object with dependencies)?

Thanks

#14 Updated by Brett Smith over 6 years ago

Bryan Cosca wrote:

Radhika Chippada wrote:

  • Not sure about this: “This part of the tutorial assumes that you are working from a machine with an SSH key that is allowed to access both clusters (required to copy git repositories)”. What exactly needed here? I had the same ssh key added in both my qr1hi and 4xphq manage_account pages. However, I did not have anything in the ~/.ssh dir when I logged into my 4xphq shell account and what I had in qr1hi is different than what I added in manage_account page. And, I was able to arv-copy in this situation. So, I do not think this statement is quite comprehensive. Please clarify.

I think I made a mistake. In the description, Brett said that you need an SSH key to copy git repositories. I think you do not need this for pipeline instances/templates.

Maybe this has changed, but last time I checked, you do need this. In order to copy a pipeline, you have to copy its component scripts; and in order to do that, you need to be able to pull from the source git repository(ies), and push to the destination repository.

If Radhika was using ssh -A to go to the shell machine, or AgentFowrarding on in her corresponding SSH options, that would explain how arv-copy worked. The shell sessions had access to the underlying key, even though ~/.ssh was empty on the host.

#15 Updated by Bryan Cosca over 6 years ago

Radhika Chippada wrote:

  • lets -> let’s ?

Done

  • “beta cloud instance qr1hi” Can we add the site url also? Some thing like “beta cloud instance qr1hi ( <a href="{{site.arvados_workbench_host}}/" target="_blank">{{site.arvados_workbench_host}}/</a> ) … ?

Done

  • “You can find the cluster name from the prefix of the uuid of the object you want to copy”. Can we add “for example, in qr1hi-4zz18-tci4vn4fa95w0zx …” ?

Done.

  • “Copy your ARVADOS_API_HOST and ARVADOS_API_TOKEN into the config files as shown below.” These config files need to be added only in the shell account from which we are executing the commands right? For example: to copy from qr1hi to 4xphq by logging into qr1hi, I just added them in qr1hi shell terminal. Is it something that you might want to clarify?

Done.

  • “The uuid can be found inside of a collection in the top left box, or from the URL bar” -> Something like “The uuid can be found in the collection display page in the collection summary area (top left box) …”

Done.

  • “You can find this collection in the lobSTR v.3 project on qr1hi.” -> Can you please use the site url here instead of qr1hi?

Done.

  • “The output of arv-copy displays the uuid of the collection generated” -> “The output of arv-copy displays the uuid of the collection generated in the destination cluster” ?

Done.

  • Do you mind adding an example for “If you want to place your collection inside of a pre-created project, you can specify the project you want it to be in using the tag --project-uuid followed by the project uuid.” ? The project uuid can be xxxxx-j7d0g-yyyyy…

Done.

  • use the ‘tutorial’ repository -> Can you please bold the word tutorial “use the tutorial repository “

Done.

  • “To :bcosc.htslib.git” Can you please rename su92l with dst_cluster or something? Not sure if su92l is public?

Done.

  • “git_git_cluster1_arvadosapi_com_reponame_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d.” -> I think it should say “qr1hi” instead of cluster1?

Done.

  • Please add an example to copy just the object (non-recursive) for a pipeline instance (an object with dependencies)?

Done.

Thanks!

#16 Updated by Radhika Chippada over 6 years ago

Looks very good. Just a couple more minor things.

  • Typo in “The uuid can be found in the collection displace page in the collection summary area”: Used the word “displace” instead of “display”
  • In “For example, in qr1hi-4zz18-tci4vn4fa95w0zx, the cluster name is qr1hi”, I think bold font for the uuid and qr1hi might be helpful
  • In “The names of the files must have the format of uuid_prefix.conf”: can you please use bold font for .conf as well (bold the whole uuid_prefix.conf)
  • “inside of a pre-created project” => “in a pre-created project”?

You can merge after you these (with or without). No need for me to review again. Thank you for this very useful doc addition.

#17 Updated by Bryan Cosca over 6 years ago

  • Status changed from In Progress to Resolved

#18 Updated by Bryan Cosca over 6 years ago

Thanks for reviewing my first commit :)

Also available in: Atom PDF