Project

General

Profile

Actions

Idea #3126

closed

[API] Support use of anonymous git url (like github https) as repository in jobs.create

Added by Tom Clegg almost 10 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Start date:
04/09/2015
Due date:
Story points:
2.0

Description

Currently, crunch-job can do this when running local jobs.

That "fetch git objects from remote repository" logic needs to be
  • Ported to ruby
  • Adjusted to suit API server's use (e.g., the commits should be pulled into the "arvados internal" repository)
  • Invoked whenever job's script_version or repository is set/changed

The default "arvados" repository used in tutorials/examples should be changed to "https://github.com/curoverse/arvados.git" so they don't depend on your site's local mirror of the arvados repository being up-to-date.

Likewise, the arvados_sdk_version feature should use the github url instead of the locally hosted arvados.git.


Subtasks 4 (0 open4 closed)

Task #5635: Validate repository name as name, url, or uuid during job#saveResolvedTom Clegg04/10/2015Actions
Task #5636: Fetch remote repo to internal.git if needed during job#saveResolvedTom Clegg04/09/2015Actions
Task #5690: Update docsResolvedTom Clegg04/09/2015Actions
Task #5691: Review 3126-remote-git-urlResolvedTom Clegg04/09/2015Actions

Related issues

Related to Arvados - Feature #4027: [Crunch] Accept SDK version as a runtime constraint. Install SDK into the docker container before running tasks.ResolvedBrett Smith11/11/2014Actions
Actions #1

Updated by Tom Clegg almost 10 years ago

  • Target version set to Arvados Future Sprints
Actions #2

Updated by Tom Clegg over 9 years ago

  • Subject changed from Support use of anonymous git url (like github https) as job repository to [API] Support use of anonymous git url (like github https) as repository in jobs.create
Actions #3

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
  • Category set to API
Actions #4

Updated by Sarah Guthrie about 9 years ago

bump This would be really useful for me!

Actions #5

Updated by Ward Vandewege about 9 years ago

  • Target version changed from Arvados Future Sprints to 2015-04-01 sprint
Actions #6

Updated by Tom Clegg about 9 years ago

  • Target version changed from 2015-04-01 sprint to 2015-04-29 sprint
Actions #7

Updated by Tom Clegg about 9 years ago

  • Assigned To set to Tom Clegg
Actions #8

Updated by Tom Clegg about 9 years ago

Notes about 3126-remote-git-url:

  • The commit+tree needed for a job is now copied into the internal repository during the job create/update transaction, regardless of whether it is hosted locally or remotely.
  • crunch-dispatch already checks the "internal" repository before copying objects from (or even looking at) a locally hosted repository, and passes the internal repository path to crunch-job with --git-dir so crunch-job uses it to do git archive, so we didn't need any changes there. The above point means the git tag and git fetch-pack code in crunch-dispatch can be removed (except perhaps a sanity check on the git tag) but I figure this might as well wait a bit, to avoid deployment races. We should stop seeing any "fetch-pack" messages in crunch-dispatch logs, which will assure us those bits aren't used any more.
  • git fetch-pack can't fetch a revision by sha1. I used git pack-objects | git unpack-objects rather than copying the entire remote repository into the internal repo with fetch-pack --all like crunch-dispatch did.
  • Using a commit from a remote repository requires copying git objects twice: clone/fetch the remote (with all tags and branches) into a cache dir dedicated to that remote in order to resolve a refspec to a sha1, then copy the necessary objects from the cache dir to the internal git dir. Fetching remote repositories directly into internal.git would have been nice and simple if it weren't for two fatal flaws:
    1. In order to look up arbitrary revisions, we have to copy remote refs into a local git-dir: git ls-remote can look up plain branches like master, but not master~3 or 78be33b. Troublesome races would occur if two concurrent transactions fetched two different remote repositories' branches into the same local git-dir and looked up "master".
    2. Even without races, an abbreviated sha1 version in a job submitted by user B could easily resolve to a commit in user A's private repository which was imported for a previous job. Essentially, it's never safe to look up anything in the shared "internal" repository except a full SHA-1 for which the current user's permission has already been established.
Actions #9

Updated by Peter Amstutz about 9 years ago

78be33b

dispatch: 4n8aq-8i9sb-798lzzemsyw4myr: Repository https://github.com/curoverse/arvados.git not found under /home/peter/work/arvados_prod_repos
/home/peter/work/arvados/services/api/app/models/arvados_model.rb:328:in `ensure_permission_to_save': ArvadosModel::PermissionDeniedError (ArvadosModel::PermissionDeniedError)

Is there a newer revision, or should I look into this error to find out what's wrong?

Actions #10

Updated by Tom Clegg about 9 years ago

Ah, sorry -- "we didn't need any changes there" was not true! Fixed, rebased on current master, now at 85378b8.

Actions #11

Updated by Peter Amstutz about 9 years ago

I'm getting API test failures:

  2) Failure:
CommitTest#test_find_commit_range_uses_fetch_remote_repository_to_get_git://github.com/curoverse/arvados.git [/home/peter/work/arvados/services/api/test/unit/commit_test.rb:35]:
Expected [] to not be empty.

  3) Failure:
CommitTest#test_find_commit_range_uses_fetch_remote_repository_to_get_http://github.com/curoverse/arvados.git [/home/peter/work/arvados/services/api/test/unit/commit_test.rb:35]:
Expected [] to not be empty.

419 tests, 1676 assertions, 2 failures, 0 errors, 2 skips
Actions #12

Updated by Tom Clegg about 9 years ago

I installed a new jessie system, had test failures, and fixed (now at de3e44a). But my https:// test failed too (not just http:// and git://) and the fix was a leaked-state-from-previous-test issue, so I'm not certain I've seen the same failure mode as note-11.

Worth a try, though.

(Also: if you get all three failures (git,http,https) with 85378b8 after removing services/api/tmp/git, that would be fairly reassuring that we were seeing the same issue.)

Actions #13

Updated by Peter Amstutz about 9 years ago

Tests are passing now, de3e44a LGTM

Actions #14

Updated by Tom Clegg about 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 75 to 100

Applied in changeset arvados|commit:86e078ae126f6651428219c726c34da3bd7f7495.

Actions

Also available in: Atom PDF