Bug #6605

[SDKs] arv-copy can copy git repositories hosted by arv-git-httpd with no SSH key installed

Added by Brett Smith about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
07/13/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

Functional requirements:

  • When reading Git repositories from the source cluster, it should try to read it through all the channels specified in the repository object's clone_urls field, in the order they're given.
  • When pushing Git repositories to the destination cluster, it should try pushing by both SSH and HTTPS, unless it has good reason to believe that one of these mechanisms won't work (see below).
  • If arv-copy tries one copying method that fails, and then the second copying method succeeds, the user should only see errors from the first method if debug output is enabled.

Nonfunctional requirements to be addressed:

  • Is there anything arv-copy can do to detect whether or not arv-git-httpd is available on the destination cluster? If so, should it do that detection, or simply assume that arv-git-httpd is always available and cope with failure?
    • One possibility: get a list of one repository, any repository, and check its clone_urls. Assume that the same push mechanisms are available for all repositories on the cluster (currently a good assumption). If clone_urls aren't available, fall back on whatever push_url advertises.
  • Should arv-copy use heuristics to determine which copying method is preferable or more likely to work before trying one or the other? (e.g., prefer arv-git-httpd if the user has no AuthorizedKeys?)
  • If both channels seem equally likely (or equally unlikely) to work, which should arv-copy try first?
  • How should errors and warnings be handled so they can be shown to the user in failure cases, but suppressed if another copy method succeeds?

Subtasks

Task #8351: Review 6605-arv-copy-httpResolvedTom Clegg


Related issues

Blocks Arvados - Bug #6606: [Workbench] Provide a GUI interface to arv-copyNew07/13/2015

Associated revisions

Revision d77bf0c6
Added by Peter Amstutz over 3 years ago

Merge branch '6605-arv-copy-http' closes #6605

History

#1 Updated by Brett Smith about 4 years ago

  • Description updated (diff)

#2 Updated by Brett Smith about 4 years ago

  • Description updated (diff)

#3 Updated by Peter Amstutz over 3 years ago

  • Status changed from New to In Progress

#4 Updated by Peter Amstutz over 3 years ago

Suggest the following behavior:

Prefer HTTPS if available by looking at clone_urls. Rationale: HTTPS is more likely to work than ssh because it only requires the API token and we already know it works. It is harder to reliable determine if SSH is going to work, need to probing the local ssh keys and the authorized_keys table).

#5 Updated by Peter Amstutz over 3 years ago

  • Status changed from In Progress to New

#6 Updated by Peter Amstutz over 3 years ago

Partial fix on branch 6605-arv-copy-http (uses http in preference to ssh, but doesn't fall back to trying ssh if http fails).

#7 Updated by Peter Amstutz over 3 years ago

  • Prefer HTTPS if available, try ssh if not available or HTTPS fails.
  • Tell git not to ask for password

#8 Updated by Peter Amstutz over 3 years ago

  • Story points set to 0.5

#9 Updated by Brett Smith over 3 years ago

  • Target version changed from Arvados Future Sprints to 2016-02-17 Sprint

#10 Updated by Brett Smith over 3 years ago

  • Assigned To set to Peter Amstutz

Changes in #6605-7 need to be done before this is reviewed. Other parts have been split into separate stories.

#11 Updated by Brett Smith over 3 years ago

  • Target version changed from 2016-02-17 Sprint to 2016-03-02 sprint

#12 Updated by Peter Amstutz over 3 years ago

6605: Check to see if each git URL works by using git ls-remote.

Preferentially use https.

Set GIT_ASKPASS=/bin/false to suppress asking for credentials.

#13 Updated by Tom Clegg over 3 years ago

6605-arv-copy-http 5fb73e1

I'm not keen on editing the user's global git config here. This edit seems pretty safe, but it's still rather surprising as a silent side effect of running arv-copy. Could we easily detect whether the credential helper is already in place, and if not, set it up and say something on stderr? Or perhaps just use "-c credential.blah.helper=blah" whenever invoking git, and avoid editing config at all.

If we stick with editing configs, since we're going to try all the http_urls if >1 are given, how about

-    if http_url:
-        setup_git_http(http_url[0])
+    for url in http_url:
+        setup_git_http(url)

...or even do it in the existing "for url in" loop, so we only do it for the urls that we actually try using.

If you feel like adding scope (but not if you don't) we could detect whether the repository is https://github.com/foo/bar and skip all the copying stuff (it looks like currently it'll just crash "not found" in that case).

#14 Updated by Peter Amstutz over 3 years ago

Tom Clegg wrote:

6605-arv-copy-http 5fb73e1

I'm not keen on editing the user's global git config here. This edit seems pretty safe, but it's still rather surprising as a silent side effect of running arv-copy. Could we easily detect whether the credential helper is already in place, and if not, set it up and say something on stderr? Or perhaps just use "-c credential.blah.helper=blah" whenever invoking git, and avoid editing config at all.

Thanks, "-c" is a much better solution, I didn't realize it was an option. Done.

If you feel like adding scope (but not if you don't) we could detect whether the repository is https://github.com/foo/bar and skip all the copying stuff (it looks like currently it'll just crash "not found" in that case).

Unfortunately it's not that easy, for this to work it also needs to do the right thing when setting the repository and script_version fields on the destination side, and that's when I decided the hole was looking a little bit dark and ratty.

#15 Updated by Tom Clegg over 3 years ago

Peter Amstutz wrote:

If you feel like adding scope (but not if you don't) we could detect whether the repository is https://github.com/foo/bar and skip all the copying stuff (it looks like currently it'll just crash "not found" in that case).

Unfortunately it's not that easy, for this to work it also needs to do the right thing when setting the repository and script_version fields on the destination side, and that's when I decided the hole was looking a little bit dark and ratty.

Ah. I thought "the right thing" would be nothing / leave them alone... but anyway, fine with punting.

I don't like conflating the existing meaning of ARVADOS_API_HOST_INSECURE ("accept any cert at https://api") with this new usage ("allow fetching/pushing repos using http://"). Perhaps an --allow-git-http flag would be more clear, and possible for a user to discover in some way other than trial & error?

#16 Updated by Peter Amstutz over 3 years ago

Tom Clegg wrote:

I don't like conflating the existing meaning of ARVADOS_API_HOST_INSECURE ("accept any cert at https://api") with this new usage ("allow fetching/pushing repos using http://"). Perhaps an --allow-git-http flag would be more clear, and possible for a user to discover in some way other than trial & error?

Added --allow-git-http-src and --allow-git-http-dst instead of looking at ARVADOS_API_HOST_INSECURE. Now at e71b6d2

#17 Updated by Peter Amstutz over 3 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:d77bf0c67422a259afacc17660698729328a1ed3.

Also available in: Atom PDF