Project

General

Profile

Actions

Feature #5416

closed

[API] Clone git repositories without using an SSH private key

Added by Tom Clegg about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
0.5

Files

5416fails.log (48.6 KB) 5416fails.log Brett Smith, 04/10/2015 03:17 PM

Subtasks 10 (0 open10 closed)

Task #5485: Add install docsResolvedTom Clegg03/12/2015Actions
Task #5469: Make arv-git-http serverResolvedTom Clegg03/12/2015Actions
Task #5453: Look into authenticating git-http with arvados tokensResolvedTom Clegg03/12/2015Actions
Task #5488: Review 5416-arv-git-httpdResolvedRadhika Chippada03/12/2015Actions
Task #5640: Use uuid instead of name as repo path in arv-git-httpdResolvedTom Clegg03/12/2015Actions
Task #5641: Review 5416-repo-dir-is-uuidResolvedBrett Smith03/12/2015Actions
Task #5487: Click job script_name to see source code (plain)ResolvedTom Clegg03/12/2015Actions
Task #5486: Add arv-git-httpd to run_test_serversResolvedTom Clegg03/12/2015Actions
Task #5777: Resolve merge conflicts, check testsResolvedTom Clegg03/12/2015Actions
Task #5687: Review 5416-browse-repo-tree (with 5416-ssl-proxy on arvados-dev)ResolvedTom Clegg03/12/2015Actions

Related issues

Related to Arvados - Feature #5368: [Workbench] job scripts accessible in some wayClosed03/03/2015Actions
Has duplicate Arvados - Feature #3455: [Workbench] Install cgit or Gitweb or similar git frontend for browsing repository contents, update workbench to link to appropriate script version when displaying jobs and pipelines.ResolvedActions
Actions #1

Updated by Tom Clegg about 9 years ago

  • Category set to API
  • Assigned To set to Tom Clegg
Actions #2

Updated by Tom Clegg about 9 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Tom Clegg about 9 years ago

Client side plan:
  • Implement git-credentials glue (and probably add a bit of config advice on the Manage Account page) to automatically use Arvados token as password when connecting to Arvados-hosted repo URIs.
Tentative plan A: Implement a git http server using Apache and git-backend.
  • Use RewriteMap to send credentials to a token-checking program.
    • Probably Python.
    • If the token can {read/write} the repository using regular Arvados API calls, permit {fetch/push} git operations.
  • If token passes, run git-http-backend.
  • (Assuming we can use the RewriteMap result to set the "Authenticated?" flag so git-http-backend knows it's ok to receive-pack.)
Tentative plan B: Implement a git http server using Go and git-backend.
  • Similar to plan A, but write the HTTP server in Go instead of messing with Apache, Rewriterule, and Auth*.
  • Disadvantage: can't share an Apache process with something else.
  • Disadvantage: will need either a proxy deployed in front of it, or the added burden of its own TLS cert/key features. (Perhaps not a serious disadvantage either way.)
  • Advantage: probably better performance (less overhead / unused Apache features)
  • Advantage: token validations aren't serialized in a queue via RewriteMap prog (this seems like it could degrade poorly in the face of temporary network/server troubles).
Actions #4

Updated by Tom Clegg about 9 years ago

It's easy to tell git to send your API token as an HTTP authentication password:

git config --global credential.https://git.4xphq.arvadosapi.com.username none
git config --global credential.https://git.4xphq.arvadosapi.com.helper '!foo(){ echo "password=$ARVADOS_API_TOKEN"; };foo'

Or, if we want to do something less trivial (perhaps source ~/.config/arvados/settings.conf first) we can provide an absolute path to a helper script that outputs "password=whatever":

git config --global credential.https://git.4xphq.arvadosapi.com.helper `which arv-git-credential`

This means that once we have token authentication on the gitolite side, it will be easy to set up git on shell VMs so users can access their repositories without forwarding an SSH agent.

git config --system [...]

Reference: https://www.kernel.org/pub/software/scm/git/docs/technical/api-credentials.html

Actions #5

Updated by Tom Clegg about 9 years ago

Work-in-progress 5416-git-auth-token at 16421a1 status:
  • Works in my dev environment
  • Has no tests yet

Try:

git clone --bare git://git.curoverse.com/arvados.git /tmp/arvados.git
cd ~/src/arvados/services/arv-git-httpd
go build
ARVADOS_API_HOST=localhost:3030 ARVADOS_API_HOST_INSECURE=true ./arv-git-httpd -repo-root=/tmp -address=localhost:59624

(replace localhost:3030 with your dev/test server endpoint, or 4xphq.arvadosapi.com, or whatever)

Then:

git config --local credential.http://localhost:59624/.helper '!foo(){ echo password=$ARVADOS_API_TOKEN; };foo'
git config --local credential.http://localhost:59624/.username none

git ls-remote http://localhost:59624/arvados.git master
ARVADOS_API_TOKEN=[......] git ls-remote http://localhost:59624/arvados.git master

For production deployment it relies on a proxy to provide TLS. (HTTP basic authentication without TLS is only useful on a local/private network.)

Actions #6

Updated by Tom Clegg about 9 years ago

Added tests, install docs, and discovery doc support; now at b4f4cbf on 5416-arv-git-httpd.

There's a 5416-arv-git-httpd branch in arvados-dev too, with packaging and testing stuff.

Actions #7

Updated by Tom Clegg almost 9 years ago

  • Subject changed from [DRAFT] [API] Clone git repositories without using an SSH private key to [API] Clone git repositories without using an SSH private key
Actions #8

Updated by Ward Vandewege almost 9 years ago

reviewing 5416-arv-git-httpd

  1. git.curoverse.com/arvados.git/services/arv-git-httpd
    /tmp/tmp.o64n3HhKy5/src/git.curoverse.com/arvados.git/services/arv-git-httpd/auth_handler.go:60: r.BasicAuth undefined (type *http.Request has no field or method BasicAuth)
    FAIL git.curoverse.com/arvados.git/services/arv-git-httpd [build failed]

This suggests the golang version requirements have changed. I am running 1.3.3. What version is required to build arv-git-httpd?

Actions #9

Updated by Tom Clegg almost 9 years ago

Indeed, it seems BasicAuth() is new in Go 1.4. Added BasicAuth compatibility shim for go1.3 in 1f3874e. Now at 452dace

Actions #10

Updated by Radhika Chippada almost 9 years ago

Some review comments. Several of these are trivial and please feel free to ignore if you prefer.

  • Install guide updates
    I could not execute the steps and verify the correctness of the documentation as “sudo /usr/bin/apt-get install arv-git-httpd” fails.
  • The tests in arvados/services/arv-git-httpd directory failed for me (most of the times) when I ran them using “go test” . Ironically, none of the tests failed for 5 out of the 40 or so runs I made. Something is wrong. I see errors similar to the following and the location of the failures and number of test failures kept varying. Not sure if there are some test steps I am missing or something around synchronization logic in server.go
FAIL: server_test.go:45: IntegrationSuite.TestReadwrite

git [fetch http://[::]:51695/foo.git] => exit status 128
server_test.go:49:
    c.Assert(err, check.Equals, nil)
... obtained *errors.errorString = &errors.errorString{s:""} ("")
... expected = nil
  • All the tests did pass (repeatedly) when tested using arvados-dev run-tests
  • services/api/test/unit/repository_test.rb
    Would it make sense to add more tests such as “without write permission, user cannot change xxx?”
  • auth_handler
    • Is it possible to make the “arv.Update” optional? As of now, for every ServeHTTP call we are making two arv calls (List and Update). Wondering if is possible to execute the Update only if needed and omit it if it is going to be for a read-only operation?
    • Alternatively, would it be possible to cache the responses to avoid making calls repeatedly for the same token and same repo?
    • I think it might improve readability of code if “var escaper = strings …” is moved into func quoteStrings
  • basic_go13.go and basic_go14.go => basic_auth_go13.go and basic_auth_go14.go might serve as better names?
  • It wouldn’t hurt to call “toks” as “tokens” in basic_go13.go
  • basic_test.go
    • type basicAuthTestCase => I do not see harm in using full words for header and password here
    • It might be nice to have a “none:bar” basicAuthTestCase as well. Agree that it serves the same results as “foo:bar”, but because our derived username from api token would be “none”, it might serve as a good example.
  • server.go
    I am wondering if some synchronization logic here is what is causing “go test” failures for me? My go version is go1.3.1 and I am assuming you are using 1.4?
  • server_test.go
    • Would it make sense to do a fetch after the push in TestReadwrite?
    • Would it make sense to do two pushes with the same repository args in the same test one after the other and verify test results?
  • Executed steps in note #5. I was able to get past authentication and able to ls refs/heads/master. Tested only locally. Can you please add “cd arvados/services/arv-git-httpd” as the first instruction here (for go build)?
  • arvados-dev update
    • How is “retry” supposed to work in jenkins / do we intend to use it in jenkins? The test stopped and waited for me to respond to Y/n question and not sure how it is intended to be used in jenkins.
Actions #11

Updated by Tom Clegg almost 9 years ago

Radhika Chippada wrote:

Some review comments. Several of these are trivial and please feel free to ignore if you prefer.

  • Install guide updates
    I could not execute the steps and verify the correctness of the documentation as “sudo /usr/bin/apt-get install arv-git-httpd” fails.

Ah, right, that'll only work after these branches get merged and jenkins starts building arv-git-httpd packages.

  • The tests in arvados/services/arv-git-httpd directory failed for me (most of the times) when I ran them using “go test” . Ironically, none of the tests failed for 5 out of the 40 or so runs I made. Something is wrong. I see errors similar to the following and the location of the failures and number of test failures kept varying. Not sure if there are some test steps I am missing or something around synchronization logic in server.go

Not sure what's up here. It looks like git exits non-zero without printing anything on stderr, which seems strange. I've changed the "runGit" func so it returns the original error in that case instead of the empty string. Perhaps that will help diagnose (and seems like a good idea anyway).

  • All the tests did pass (repeatedly) when tested using arvados-dev run-tests

Could there be something different about how you're running apiserver outside run-tests?

  • services/api/test/unit/repository_test.rb
    Would it make sense to add more tests such as “without write permission, user cannot change xxx?”

Good call. Added "write perm insufficient to change name" and "cannot change modified_at without write perm".

  • auth_handler
    • Is it possible to make the “arv.Update” optional? As of now, for every ServeHTTP call we are making two arv calls (List and Update). Wondering if is possible to execute the Update only if needed and omit it if it is going to be for a read-only operation?
    • Alternatively, would it be possible to cache the responses to avoid making calls repeatedly for the same token and same repo?

That would reduce API calls and latency, but it would also introduce a delay in permissions taking effect, and the delay wouldn't even be the same as the gitolite delay, which I think would be confusing/annoying for users. IMO it's OK (or better) for now, at least, to stay simple and let the API server itself do the permission caching.

  • I think it might improve readability of code if “var escaper = strings …” is moved into func quoteStrings

That would create a new Replacer each time quoteStrings() runs, instead of creating just one at startup and reusing it...

  • basic_go13.go and basic_go14.go => basic_auth_go13.go and basic_auth_go14.go might serve as better names?

Renamed.

  • It wouldn’t hurt to call “toks” as “tokens” in basic_go13.go

Renamed.

  • basic_test.go
    • type basicAuthTestCase => I do not see harm in using full words for header and password here

FWIW, Go explicitly prefers short variable names for limited scope. https://github.com/golang/go/wiki/CodeReviewComments#variable-names

  • It might be nice to have a “none:bar” basicAuthTestCase as well. Agree that it serves the same results as “foo:bar”, but because our derived username from api token would be “none”, it might serve as a good example.

That sounds like it would only test base64 differently (which isn't ours to test), not our header recognition, so I'm inclined to leave it alone.

  • server.go
    I am wondering if some synchronization logic here is what is causing “go test” failures for me? My go version is go1.3.1 and I am assuming you are using 1.4?

I've been using 1.4 and 1.3.3, but then, I've also been using run-tests.

Perhaps worth updating to at least 1.3.3 before doing too much more troubleshooting? (Seems like debian jessie has 1.3.3... did you install a tarball from golang instead?)

  • server_test.go
    • Would it make sense to do a fetch after the push in TestReadwrite?

Not sure... Checking for "foo/.git/refs/heads/newbranch" seems like a more direct way of confirming that the push worked. Are you thinking the push could alter the permissions and cause fetch to fail?

  • Would it make sense to do two pushes with the same repository args in the same test one after the other and verify test results?

Not sure what this would test. (I don't really want to get into testing git itself...)

  • Executed steps in note #5. I was able to get past authentication and able to ls refs/heads/master. Tested only locally. Can you please add “cd arvados/services/arv-git-httpd” as the first instruction here (for go build)?

Done, but that note was a "work in progress" comment and isn't especially relevant now that we have actual tests & install instructions...

  • arvados-dev update
    • How is “retry” supposed to work in jenkins / do we intend to use it in jenkins? The test stopped and waited for me to respond to Y/n question and not sure how it is intended to be used in jenkins.

No, it's just for (our) interactive use, not for Jenkins. It makes the "edit code, run tests again" cycle faster. (By the time the code gets to Jenkins, it's expected to pass every time, so we don't want it to --retry anything.)

Now at d843787

Actions #12

Updated by Tom Clegg almost 9 years ago

  • Story points changed from 2.0 to 0.5
Actions #13

Updated by Tom Clegg almost 9 years ago

  • Target version changed from 2015-04-01 sprint to 2015-04-29 sprint
Actions #14

Updated by Brett Smith almost 9 years ago

Reviewing 949e725

The changes are good. Since you're taking days off, I've gone ahead and merged the branch with master, so it doesn't drift too far. Thanks!

I'm having an issue where the tests don't seem to run reliably. Here are different errors I got over the course of five runs:

FAIL: server_test.go:63: IntegrationSuite.TestNoPermission

git [fetch http://[::]:58941/active/foo.git] => exit status 128
server_test.go:68:
    c.Assert(err, check.ErrorMatches, `.* not found.*`)
... error string = "exit status 128" 
... regex string = ".* not found.*" 

FAIL: server_test.go:24: IntegrationSuite.TestPathVariants

git [fetch http://[::]:50132/active/foo.git] => <nil>
git [fetch http://[::]:50132/active/foo/.git] => <nil>
git [fetch http://[::]:50132/arvados.git] => exit status 128
server_test.go:30:
    c.Assert(err, check.Equals, nil)
... obtained *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc20801ea80)} ("exit status 128")
... expected = nil

FAIL: server_test.go:45: IntegrationSuite.TestReadwrite

git [fetch http://[::]:39919/active/foo.git] => exit status 128
server_test.go:49:
    c.Assert(err, check.Equals, nil)
... obtained *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc2080686e0)} ("exit status 128")
... expected = nil

The failures change on each run, and I did get a passing run, so I figured this was likely a separate issue from the changes you made in this branch, and that's why I felt comfortable merging. But I do think we should look into and resolve this, maybe in a separate ticket, before we start running these tests in Jenkins. It looks like a timing issue, possibly compounded by version differences across our systems.

Actions #15

Updated by Ward Vandewege almost 9 years ago

The jenkins tests fail consistently (see run 1448-1452) with this error:


        ********** Running services/arv-git-httpd tests **********

2015/04/05 20:42:27 authSettings: map[ARVADOS_API_TOKEN:somebigsecret ARVADOS_API_HOST:0.0.0.0:59853 ARVADOS_API_HOST_INSECURE:true]

----------------------------------------------------------------------
FAIL: /tmp/tmp.CSpA0PVzzK/src/git.curoverse.com/arvados.git/services/arv-git-httpd/server_test.go:76: IntegrationSuite.SetUpTest

/tmp/tmp.CSpA0PVzzK/src/git.curoverse.com/arvados.git/services/arv-git-httpd/server_test.go:87:
    c.Assert(err, check.Equals, nil)
... obtained *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc20804f5a0)} ("exit status 128")
... expected = nil

----------------------------------------------------------------------
PANIC: /tmp/tmp.CSpA0PVzzK/src/git.curoverse.com/arvados.git/services/arv-git-httpd/server_test.go:63: IntegrationSuite.TestNoPermission

... Panic: Fixture has panicked (see related PANIC)
OOPS: 0 passed, 1 FAILED, 5 MISSED
--- FAIL: Test (2.10 seconds)
FAIL
FAIL    git.curoverse.com/arvados.git/services/arv-git-httpd    2.105s

    ********** !!!!!! services/arv-git-httpd tests FAILED !!!!!! **********

I've disabled the arv-git-httpd tests on ci.curoverse.com for now.

Actions #16

Updated by Tom Clegg almost 9 years ago

Ward Vandewege wrote:

The jenkins tests fail consistently (see run 1448-1452) with this error:

[...]

I've disabled the arv-git-httpd tests on ci.curoverse.com for now.

Ah. I'm guessing this is because

jenkins@ci:/tmp/x$ git commit -am 'foo'

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com" 
  git config --global user.name "Your Name" 

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: empty ident  <jenkins@ci.curoverse.com
> not allowed
128!jenkins@ci:/tmp/x$

Added explicit config flags to avoid this in 6ab9f32.

Actions #17

Updated by Tom Clegg almost 9 years ago

Brett Smith wrote:

The failures change on each run, and I did get a passing run, so I figured this was likely a separate issue from the changes you made in this branch, and that's why I felt comfortable merging. But I do think we should look into and resolve this, maybe in a separate ticket, before we start running these tests in Jenkins. It looks like a timing issue, possibly compounded by version differences across our systems.

These failures seem to say "git exited 128 without saying anything on stdout or stderr" (sort of like this ?). I haven't come up with a plausible explanation, or encountered it myself even once, but I've added a bit more logging in f9382ce which could conceivably give us more to go on.

Actions #18

Updated by Brett Smith almost 9 years ago

Tom Clegg wrote:

These failures seem to say "git exited 128 without saying anything on stdout or stderr" (sort of like this ?). I haven't come up with a plausible explanation, or encountered it myself even once, but I've added a bit more logging in f9382ce which could conceivably give us more to go on.

I've attached a log of five failures under go test -check.vv.

Actions #19

Updated by Peter Amstutz almost 9 years ago

workbench/app/models/repository.rb#http_fetch_url doesn't use 'gitHttpBase' which is advertised in the discovery document. (I hacked it with the right URL for my dev setup so I could continue the review.)

The "manage account" page doesn't link to the repository browser, but I'm guessing that is an intentional feature omission since there is not yet a way to choose which a branch or revision to view.

Why does it provide a flat list of the files including subdirectories, when it understands directories and provides breadcrumbs?

Loading the "tree" view fails inconsistently. If I hit reload over and over, it fails about 1 out of every 5 times.

#<Repository::GitCommandError: `git fetch http://localhost:3005/peter2.git +*:*` pid 18051 exit 128: >
/home/peter/work/arvados/apps/workbench/app/models/repository.rb:110:in `run_git'
/home/peter/work/arvados/apps/workbench/app/models/repository.rb:58:in `refresh'
/home/peter/work/arvados/apps/workbench/app/models/repository.rb:27:in `ls_tree_lr'
/home/peter/work/arvados/apps/workbench/app/models/repository.rb:39:in `ls_subtree'
/home/peter/work/arvados/apps/workbench/app/controllers/repositories_controller.rb:23:in `show_tree'
/home/peter/.rvm/gems/ruby-2.1.1/gems/actionpack-4.1.9/lib/action_controller/metal/implicit_render.rb:4:i

I clicked on "apps / workbench / app / assets / images / dax.png" and got "This file has an invalid text encoding". I'm glad it didn't dump binary data into the HTML file, but it would be nice if it used MIME types to decide what to do with the file, and/or offer an option to download the file. Although, on further consideration, this quickest way to do that would entail gatewaying the data from git through workbench, it's bad enough that we do that for downloading files from collections, we should try to avoid making that mistake again.

The root directory of a repository is linked with two slashes in the breadcrumbs:

http://localhost:3000/repositories/4n8aq-s0uqq-38lrtxmtuy0w1e7/tree/09ef06513704a54a0edceb6ef2ad20c74fa30cf7//
Actions #20

Updated by Tom Clegg almost 9 years ago

Peter Amstutz wrote:

workbench/app/models/repository.rb#http_fetch_url doesn't use 'gitHttpBase' which is advertised in the discovery document. (I hacked it with the right URL for my dev setup so I could continue the review.)

Any objection to changing this to gitUrl or gitUrlBase to match the other similar discovery doc fields (baseUrl, rootUrl) and avoid connoting it will always be http://?

The "manage account" page doesn't link to the repository browser, but I'm guessing that is an intentional feature omission since there is not yet a way to choose which a branch or revision to view.

Right. I'm trying to walk the line between "merge a feature that is too incomplete to actually use" and "take forever to implement a full-featured repository browser/editor/etc".

Why does it provide a flat list of the files including subdirectories, when it understands directories and provides breadcrumbs?

Tossed around ideas about this with Bryan. Just showing a single level would make it super annoying to navigate a tree of files, until we added "expand/collapse this directory" and "expand all directories". The simplest non-frustrating option seemed to be "always expand all".

We can certainly add some javascript to expand/collapse but I didn't think this should be a blocker. Would you agree?

Loading the "tree" view fails inconsistently. If I hit reload over and over, it fails about 1 out of every 5 times.

[...]

Hm. I haven't seen that; I'll try to reproduce. Let me know if you happen to notice more clues.

I clicked on "apps / workbench / app / assets / images / dax.png" and got "This file has an invalid text encoding". I'm glad it didn't dump binary data into the HTML file, but it would be nice if it used MIME types to decide what to do with the file, and/or offer an option to download the file. Although, on further consideration, this quickest way to do that would entail gatewaying the data from git through workbench, it's bad enough that we do that for downloading files from collections, we should try to avoid making that mistake again.

This is another instance of "Hopefully it's better to display only plain text than to not show the code at all because we haven't made it handle images/PDFs/markdown/source-highlighting/whatever yet".

The motivation that makes this a priority is to reveal the specific version of the specific file that was used to run a specific job, without sending you off to a command line. The current features are -- hopefully -- enough to cover that plus "import thebrainsoftheoperation; dostuff() doesn't tell me much, can I see thebrainsoftheoperation.py please"...

Actions #21

Updated by Peter Amstutz almost 9 years ago

Any objection to changing this to gitUrl or gitUrlBase to match the other similar discovery doc fields (baseUrl, rootUrl) and avoid connoting it will always be http://?

No objection, except there's already a git_host field, and there's no harm in continuing to distinguish between "the host you ssh to" and "the host you use http with".

Why does it provide a flat list of the files including subdirectories, when it understands directories and provides breadcrumbs?

Tossed around ideas about this with Bryan. Just showing a single level would make it super annoying to navigate a tree of files, until we added "expand/collapse this directory" and "expand all directories". The simplest non-frustrating option seemed to be "always expand all".

I think showing everything by default actually makes it more annoying, because if you have 1000 files in directory 'A' but you want to look at the contents of directory 'B' you have to scroll past all the stuff in directory 'A' you don't care about. Browsing a non-trivial tree is basically impossible if you don't already know what you are looking for. Even github (who presumably has the resources to pay real UX designers to think about this stuff) does it one-page-per-directory-level.

(It may be that you're looking at nice neat repos with just a few things, whereas in my tests I happen to browsing a clone of the Arvados repository, which has a lot more stuff...)

We can certainly add some javascript to expand/collapse but I didn't think this should be a blocker. Would you agree?

I don't think we should block on adding Javascript, but if you choose not intend to keep the "one big page", consider using indentation indicate directory nesting (like the collections page) instead of repeating the directory path over and over (which adds visual clutter.)

Loading the "tree" view fails inconsistently. If I hit reload over and over, it fails about 1 out of every 5 times.

[...]

Hm. I haven't seen that; I'll try to reproduce. Let me know if you happen to notice more clues.

I am getting it quite a lot and would consider it a major blocker unless/until we can identify that it's caused by something specific to my dev setup. Any suggestions to debug?

The motivation that makes this a priority is to reveal the specific version of the specific file that was used to run a specific job, without sending you off to a command line. The current features are -- hopefully -- enough to cover that plus "import thebrainsoftheoperation; dostuff() doesn't tell me much, can I see thebrainsoftheoperation.py please"...

Yes, I understand, I don't want to load up on feature requests because something is still better than nothing.

More notes:

Not excited by the approach of making a workbench local clone, but I understand why you did it. In principal you should be able to access individual git objects as needed over HTTP and render one tree or file at a time without pulling the whole repository, but in practice that would mean having to write a small git protocol client which is probably more complicated than shelling out to the git command line.

Actions #22

Updated by Tom Clegg almost 9 years ago

Peter Amstutz wrote:

I think showing everything by default actually makes it more annoying, because if you have 1000 files in directory 'A' but you want to look at the contents of directory 'B' you have to scroll past all the stuff in directory 'A' you don't care about. Browsing a non-trivial tree is basically impossible if you don't already know what you are looking for.

Meh. Browsers have "find", which works well in an all-expanded tree, and doesn't work at all otherwise.

(It may be that you're looking at nice neat repos with just a few things, whereas in my tests I happen to browsing a clone of the Arvados repository, which has a lot more stuff...)

That was my test case too. I was able to view (1) the crunch script for the job I was looking at; (2) the rest of the stuff in crunch_scripts; (3) the whole thing. The whole experience was much better than "I wish I could see the code in Workbench". :)

I don't think we should block on adding Javascript, but if you choose not intend to keep the "one big page", consider using indentation indicate directory nesting (like the collections page) instead of repeating the directory path over and over (which adds visual clutter.)

Do you consider this a blocker or would you accept getting feedback from users (about this and other things) in the meantime?

Loading the "tree" view fails inconsistently. If I hit reload over and over, it fails about 1 out of every 5 times.

[...]

Hm. I haven't seen that; I'll try to reproduce. Let me know if you happen to notice more clues.

I am getting it quite a lot and would consider it a major blocker unless/until we can identify that it's caused by something specific to my dev setup. Any suggestions to debug?

No, I don't want to merge without some indication this isn't going to happen in production.

This is so similar to Brett's reports (git says nothing and exits 128) -- surely it's related.

What's your git --version? (I have 1.7.10.4)

It's possible the Go version is related (I've tested with 1.3.3 but usually use 1.4.2). But surely "git says nothing and exits 128" shouldn't happen no matter what arv-git-httpd or anything else is doing...?

Not excited by the approach of making a workbench local clone, but I understand why you did it. In principal you should be able to access individual git objects as needed over HTTP and render one tree or file at a time without pulling the whole repository, but in practice that would mean having to write a small git protocol client which is probably more complicated than shelling out to the git command line.

"Probably", hah... The git tools are usually modular enough that we don't have to write our own. It's possible some "shallow" feature could save some bandwidth, but even that seemed like a premature optimization. I figured typical usage involves viewing many objects/files/versions in a given repository anyway; this way once you have a commit you don't have to keep fetching it.

Incidentally, a local copy also lets you resolve arbitrary gitrevisions, which could facilitate nice features in other parts of Workbench (e.g., type "master^^" in the version box and see the commit hash that will be used).

A browser-based client would be nice, but seems impractical at the moment.

Actions #23

Updated by Tom Clegg almost 9 years ago

After lots of testing, I've convinced myself the inconsistent failures are the fault of git (or something under it).
  • wheezy, git version 1.7.10.4 → everything always works.
  • jessie, git version 2.1.4 → unreliable.
  • The failure mode involves making an http request with no authentication header (to which arv-git-httpd responds 401) and then giving up instead of making another request with the configured credentials.

(From Brett's attachment) what it looks like when git works:

2015/04/10 11:12:24 "[::1]:34569" "" "" 401 "no credentials provided" "" "/active/foo.git/info/refs" 
2015/04/10 11:12:24 "[::1]:34570" "" "" 401 "no credentials provided" "" "/active/foo.git/info/refs" 
2015/04/10 11:12:24 "[::1]:34570" "none" "3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi" 200 "read" "active/foo" "/zzzzz-s0uqq-382brsig8rp3666/.git/info/refs" 
2015/04/10 11:12:24 "[::1]:34572" "none" "3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi" 200 "write" "active/foo" "/zzzzz-s0uqq-382brsig8rp3666/.git/git-receive-pack" 
git [push http://[::]:54731/active/foo.git master:newbranch] => <nil>
To http://[::]:54731/active/foo.git
 * [new branch]      master -> newbranch

...
PASS: server_test.go:45: IntegrationSuite.TestReadwrite    0.402s

What it looks like when git gives up early:

2015/04/10 11:15:53 "[::1]:36103" "" "" 401 "no credentials provided" "" "/active/foo.git/info/refs" 
git [fetch http://[::]:34995/active/foo.git] => exit status 128

...
FAIL: server_test.go:45: IntegrationSuite.TestReadwrite

I tried a few things like disabling HTTP keepalive (that didn't help) and ensuring the credential helper really was supplying tokens as expected (it was) and eventually concluded it just isn't feasible (or especially important) to do repository browsing reliably when git is unreliable.

Now at 983e7c8 with
  • Repo browsing features disabled if git2 is installed.
  • Config switch to override that and use git2 anyway if you don't care it's unreliable, or know your version's fixed, or are trying to find out whether your version's fixed.
  • No second trailing slash on repo breadcrumb link.
  • Repository responses have both ssh and https in clone_urls, with more config options (you can set each to "true" for default git.zzzzz.arvadosapi.com or "false" to disable).
  • arv-git-httpd's own tests updated to use http://user:pass@host:port/ instead of credential helpers, because that seems to work in both git 1.7 and git 2.1.
...and d648569 on arvados-dev with
  • run keepproxy, arv-git-httpd, and SSL proxy services during Workbench test suites
  • support for services/arv-git-httpd_test=-check.vv
  • terminate connections on the configured test database (not necessarily arvados_test).
Actions #24

Updated by Peter Amstutz almost 9 years ago

In arvados-dev 5416-ssl-proxy, if the test fails (due to a misconfiguration on my part, here) it doesn't shut down keepproxy:

               ********** Running apps/workbench tests **********

Starting keepproxy, arv-git-httpd, and nginx ssl proxy...
2015/04/22 13:29:58 Arvados Keep proxy started listening on [::]:38986
2015/04/22 13:29:58 Updated services list: locals map[zzzzz-bi6l4-rsnj3c76ndxb7o0:http://keep1.zzzzz.arvadosapi.com:25107 zzzzz-bi6l4-6zhilxar6r8ey90:http://keep0.zzzzz.arvadosapi.com:25107] gateways map[zzzzz-bi6l4-6zhilxar6r8ey90:http://keep0.zzzzz.arvadosapi.com:25107 zzzzz-bi6l4-rsnj3c76ndxb7o0:http://keep1.zzzzz.arvadosapi.com:25107]
Traceback (most recent call last):
  File "sdk/python/tests/run_test_server.py", line 570, in <module>
    run_keep_proxy()
  File "sdk/python/tests/run_test_server.py", line 353, in run_keep_proxy
    filters=[['service_type','=','proxy']]).execute()['items']:
  File "/tmp/tmp.d8VNH2nIBd/local/lib/python2.7/site-packages/oauth2client/util.py", line 137, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/tmp/tmp.d8VNH2nIBd/local/lib/python2.7/site-packages/googleapiclient/http.py", line 722, in execute
    body=self.body, headers=self.headers)
  File "/home/peter/work/arvados/sdk/python/arvados/api.py", line 33, in _intercept_http_request
    return self.orig_http_request(uri, **kwargs)
  File "/tmp/tmp.d8VNH2nIBd/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1608, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/tmp/tmp.d8VNH2nIBd/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1350, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/tmp/tmp.d8VNH2nIBd/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1278, in _conn_request
    raise ServerNotFoundError("Unable to find the server at %s" % conn.host)
httplib2.ServerNotFoundError: Unable to find the server at petere1

        ********** !!!!!! apps/workbench tests FAILED !!!!!! **********

$ ps ax |grep keep
 5477 pts/0    Sl     0:00 keepproxy -pid=/home/peter/work/arvados/tmp/keepproxy.pid -listen=:38986
Actions #25

Updated by Peter Amstutz almost 9 years ago

Tests in 5416-browse-repo-tree pass.

It would be nice if you could incorporate 5416-use-netrc-credentials so it works on git2 but I won't consider it a blocker.

Rest of it LGTM.

Actions #26

Updated by Tom Clegg almost 9 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF