Story #3126

[API] Support use of anonymous git url (like github https) as repository in jobs.create

Added by Tom Clegg over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
04/09/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

Currently, crunch-job can do this when running local jobs.

That "fetch git objects from remote repository" logic needs to be
  • Ported to ruby
  • Adjusted to suit API server's use (e.g., the commits should be pulled into the "arvados internal" repository)
  • Invoked whenever job's script_version or repository is set/changed

The default "arvados" repository used in tutorials/examples should be changed to "https://github.com/curoverse/arvados.git" so they don't depend on your site's local mirror of the arvados repository being up-to-date.

Likewise, the arvados_sdk_version feature should use the github url instead of the locally hosted arvados.git.


Subtasks

Task #5635: Validate repository name as name, url, or uuid during job#saveResolvedTom Clegg

Task #5636: Fetch remote repo to internal.git if needed during job#saveResolvedTom Clegg

Task #5690: Update docsResolvedTom Clegg

Task #5691: Review 3126-remote-git-urlResolvedTom Clegg


Related issues

Related to Arvados - Feature #4027: [Crunch] Accept SDK version as a runtime constraint. Install SDK into the docker container before running tasks.Resolved11/11/2014

Associated revisions

Revision 86e078ae
Added by Tom Clegg almost 6 years ago

Merge branch '3126-remote-git-url' closes #3126

Revision c674deff (diff)
Added by Tom Clegg almost 6 years ago

Revert errant change in f8e6cb3. refs #3126

History

#1 Updated by Tom Clegg over 6 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Tom Clegg about 6 years ago

  • Subject changed from Support use of anonymous git url (like github https) as job repository to [API] Support use of anonymous git url (like github https) as repository in jobs.create

#3 Updated by Tom Clegg about 6 years ago

  • Description updated (diff)
  • Category set to API

#4 Updated by Sarah Guthrie almost 6 years ago

bump This would be really useful for me!

#5 Updated by Ward Vandewege almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2015-04-01 sprint

#6 Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2015-04-01 sprint to 2015-04-29 sprint

#7 Updated by Tom Clegg almost 6 years ago

  • Assigned To set to Tom Clegg

#8 Updated by Tom Clegg almost 6 years ago

Notes about 3126-remote-git-url:

  • The commit+tree needed for a job is now copied into the internal repository during the job create/update transaction, regardless of whether it is hosted locally or remotely.
  • crunch-dispatch already checks the "internal" repository before copying objects from (or even looking at) a locally hosted repository, and passes the internal repository path to crunch-job with --git-dir so crunch-job uses it to do git archive, so we didn't need any changes there. The above point means the git tag and git fetch-pack code in crunch-dispatch can be removed (except perhaps a sanity check on the git tag) but I figure this might as well wait a bit, to avoid deployment races. We should stop seeing any "fetch-pack" messages in crunch-dispatch logs, which will assure us those bits aren't used any more.
  • git fetch-pack can't fetch a revision by sha1. I used git pack-objects | git unpack-objects rather than copying the entire remote repository into the internal repo with fetch-pack --all like crunch-dispatch did.
  • Using a commit from a remote repository requires copying git objects twice: clone/fetch the remote (with all tags and branches) into a cache dir dedicated to that remote in order to resolve a refspec to a sha1, then copy the necessary objects from the cache dir to the internal git dir. Fetching remote repositories directly into internal.git would have been nice and simple if it weren't for two fatal flaws:
    1. In order to look up arbitrary revisions, we have to copy remote refs into a local git-dir: git ls-remote can look up plain branches like master, but not master~3 or 78be33b. Troublesome races would occur if two concurrent transactions fetched two different remote repositories' branches into the same local git-dir and looked up "master".
    2. Even without races, an abbreviated sha1 version in a job submitted by user B could easily resolve to a commit in user A's private repository which was imported for a previous job. Essentially, it's never safe to look up anything in the shared "internal" repository except a full SHA-1 for which the current user's permission has already been established.

#9 Updated by Peter Amstutz almost 6 years ago

78be33b

dispatch: 4n8aq-8i9sb-798lzzemsyw4myr: Repository https://github.com/curoverse/arvados.git not found under /home/peter/work/arvados_prod_repos
/home/peter/work/arvados/services/api/app/models/arvados_model.rb:328:in `ensure_permission_to_save': ArvadosModel::PermissionDeniedError (ArvadosModel::PermissionDeniedError)

Is there a newer revision, or should I look into this error to find out what's wrong?

#10 Updated by Tom Clegg almost 6 years ago

Ah, sorry -- "we didn't need any changes there" was not true! Fixed, rebased on current master, now at 85378b8.

#11 Updated by Peter Amstutz almost 6 years ago

I'm getting API test failures:

  2) Failure:
CommitTest#test_find_commit_range_uses_fetch_remote_repository_to_get_git://github.com/curoverse/arvados.git [/home/peter/work/arvados/services/api/test/unit/commit_test.rb:35]:
Expected [] to not be empty.

  3) Failure:
CommitTest#test_find_commit_range_uses_fetch_remote_repository_to_get_http://github.com/curoverse/arvados.git [/home/peter/work/arvados/services/api/test/unit/commit_test.rb:35]:
Expected [] to not be empty.

419 tests, 1676 assertions, 2 failures, 0 errors, 2 skips

#12 Updated by Tom Clegg almost 6 years ago

I installed a new jessie system, had test failures, and fixed (now at de3e44a). But my https:// test failed too (not just http:// and git://) and the fix was a leaked-state-from-previous-test issue, so I'm not certain I've seen the same failure mode as note-11.

Worth a try, though.

(Also: if you get all three failures (git,http,https) with 85378b8 after removing services/api/tmp/git, that would be fairly reassuring that we were seeing the same issue.)

#13 Updated by Peter Amstutz almost 6 years ago

Tests are passing now, de3e44a LGTM

#14 Updated by Tom Clegg almost 6 years ago

  • Status changed from New to Resolved
  • % Done changed from 75 to 100

Applied in changeset arvados|commit:86e078ae126f6651428219c726c34da3bd7f7495.

Also available in: Atom PDF