Bug #6772

[API] Should not be necessary to host git repos on the same host as API server

Added by Tom Clegg over 5 years ago. Updated about 4 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:



Currently, gitolite and arv-git-httpd must be installed on the same node as the API server.

This contributes to the undesirable rule that a site can have only one host running an API server.


When arvados-git-httpd is in use, hosted repositories should be treated the same way as remote repositories in source:services/api/app/models/commit.rb: i.e., when validating a job submission, fetch the repository from arvados-git-httpd and put it in the local cache.

Extend Commit.git_dir_for to return remote=true for locally hosted repos when config.git_repo_https_base is not false.

Extend Commit.fetch_remote_repository
  • call remote_url? to decide whether this is a local repo
  • if so, look up the repo and call https_clone_url to get the remote URL
  • update Commit.must_git to accept a "use token as credential?" argument and set up the git credentials accordingly, using an env var and a credential helper as in source:services/arv-git-httpd/integration_test.go (but presumably it would be more race-safe to specify the helper with a command line argument instead of running git config to edit a config file)

If the API server config has git_repo_https_base: false the previous behavior should continue to work.

Optional optimization

With the naïve approach, when arv-git-httpd authenticates, there will be three HTTP connections open: client->API->arv-git-httpd->API. This could be avoided or minimized by having arv-git-httpd bypass per-repository authentication entirely when given a special pre-shared secret token (similar to keepstore's data manager token), or by having it cache credentials and sometimes skip API lookups when fetching a repo by UUID.


#1 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#2 Updated by Brett Smith about 5 years ago

This got a big bump in priority because doing so is expected to enable us to deploy multiple Rails servers, improving the reliability of the entire cluster.

#3 Updated by Tom Clegg about 4 years ago

  • Description updated (diff)

#4 Updated by Tom Morris about 4 years ago

  • Assigned To set to Tom Morris

#5 Updated by Tom Morris about 4 years ago

  • Assigned To deleted (Tom Morris)

#6 Updated by Peter Amstutz about 4 years ago

From arv-copy, setting up token credentials entirely on the command line:

            git_config = ["-c", "credential.%s/.username=none" % baseurl,
                          "-c", "credential.%s/.helper=!cred(){ cat >/dev/null; if [ \"$1\" = get ]; then echo password=$ARVADOS_API_TOKEN; fi; };cred" % baseurl]

Also available in: Atom PDF