Bug #6772
open[API] Should not be necessary to host git repos on the same host as API server
Description
Background¶
Currently, gitolite and arv-git-httpd must be installed on the same node as the API server.
This contributes to the undesirable rule that a site can have only one host running an API server.
Implementation¶
When arvados-git-httpd is in use, hosted repositories should be treated the same way as remote repositories in source:services/api/app/models/commit.rb: i.e., when validating a job submission, fetch the repository from arvados-git-httpd and put it in the local cache.
Extend Commit.git_dir_for
to return remote=true for locally hosted repos when config.git_repo_https_base
is not false
.
Commit.fetch_remote_repository
- call
remote_url?
to decide whether this is a local repo - if so, look up the repo and call
https_clone_url
to get the remote URL - update
Commit.must_git
to accept a "use token as credential?" argument and set up the git credentials accordingly, using an env var and a credential helper as in source:services/arv-git-httpd/integration_test.go (but presumably it would be more race-safe to specify the helper with a command line argument instead of runninggit config
to edit a config file)
If the API server config has git_repo_https_base: false
the previous behavior should continue to work.
Optional optimization¶
With the naïve approach, when arv-git-httpd authenticates, there will be three HTTP connections open: client->API->arv-git-httpd->API. This could be avoided or minimized by having arv-git-httpd bypass per-repository authentication entirely when given a special pre-shared secret token (similar to keepstore's data manager token), or by having it cache credentials and sometimes skip API lookups when fetching a repo by UUID.