Bug #3551

[Keep] Clean up source tree layout

Added by Tom Clegg about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
08/09/2014
Due date:
% Done:

100%

Estimated time:
(Total: 6.00 h)
Story points:
1.0

Description

Objectives
  • Conform to Go conventions, see http://golang.org/doc/code.html
  • Install keep programs and libraries using "go get {url}" and implicitly via "import"
  • Avoid superfluous hierarchy in source tree
  • Avoid proliferation of build directories inside the source tree that need .gitignore (i.e., do not expect people to use a GOPATH inside our source tree)
  • Rename "keep" program to more specific "keepstore"

Subtasks

Task #3554: Update jenkins scriptResolvedTom Clegg

Task #3553: Review 3551-go-layoutResolvedTom Clegg

Task #3552: Clean up source tree, make sure tests and docker workResolvedTom Clegg

Associated revisions

Revision 24415941
Added by Tom Clegg about 5 years ago

Merge branch '3551-go-layout' closes #3551

Revision 3d678bcc (diff)
Added by Tom Clegg about 5 years ago

Update Go tests to new layout, rename keepstore, add missing tests for keepproxy and Go SDK libraries. refs #3551

Revision 3d678bcc (diff)
Added by Tom Clegg about 5 years ago

Update Go tests to new layout, rename keepstore, add missing tests for keepproxy and Go SDK libraries. refs #3551

Revision 1ae46a07
Added by Ward Vandewege about 5 years ago

Merge branch '3551-go-layout'

refs #3551

Revision 1ae46a07
Added by Ward Vandewege about 5 years ago

Merge branch '3551-go-layout'

refs #3551

Revision d9b0e600
Added by Ward Vandewege about 5 years ago

Merge branch '3551-go-layout'

refs #3551

Revision d9b0e600
Added by Ward Vandewege about 5 years ago

Merge branch '3551-go-layout'

refs #3551

Revision b5a27878 (diff)
Added by Tom Clegg about 5 years ago

gitignore sdk/cli/vendor. refs #3551

Revision 17341ef8
Added by Ward Vandewege about 5 years ago

Merge branch '3551-go-layout'

refs #3551

Revision 17341ef8
Added by Ward Vandewege about 5 years ago

Merge branch '3551-go-layout'

refs #3551

Revision 2d816cf5 (diff)
Added by Ward Vandewege about 5 years ago

keep is now called keepstore

refs #3551

Revision 2d816cf5 (diff)
Added by Ward Vandewege about 5 years ago

keep is now called keepstore

refs #3551

Revision 17e388d6 (diff)
Added by Ward Vandewege about 5 years ago

docker: be a bit smarter in the make clean target, so that we also
accomodate diretories underneath the generated directory.

refs #3551

History

#1 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#2 Updated by Tom Clegg about 5 years ago

New instructions for hacking/testing/installing Keep utilities:

Install daemons

Set GOPATH (e.g., GOPATH=~/go -- not inside the source tree).

go get git.curoverse.com/arvados.git/services/keepproxy
go get git.curoverse.com/arvados.git/services/keepstore
go get git.curoverse.com/arvados.git/services/crunchstat

That's it! Now keepproxy and keepstore programs are in your $GOPATH/bin directory.

Update daemons and libraries to latest (master)

go get -u git.curoverse.com/arvados.git/services/keepproxy
go get -u git.curoverse.com/arvados.git/services/keepstore
go get -u git.curoverse.com/arvados.git/sdk/go/arvadosclient
go get -u git.curoverse.com/arvados.git/sdk/go/keepclient
go get -u git.curoverse.com/arvados.git/sdk/go/streamer

Use libraries in your Go programs

import (
    "git.curoverse.com/arvados.git/sdk/go/streamer" 
    "git.curoverse.com/arvados.git/sdk/go/keepclient" 
    "git.curoverse.com/arvados.git/sdk/go/arvadosclient" 
)

Test local modifications

Say you want to build a "keepproxy" binary using your local modifications (to both the keepproxy.go application and the keepclient.go imported library). Your working directory is ~/src/arvados.

Just make a symlink in your GOPATH/src directory, so Go uses your modified tree instead of fetching it from git. Then use go install, go test, etc.

rm -r $GOPATH/src/git.curoverse.com
mkdir $GOPATH/src/git.curoverse.com
ln -s ~/src/arvados $GOPATH/src/git.curoverse.com/arvados.git
# install dependencies for testing
go get -t git.curoverse.com/arvados.git/services/keepstore
# run tests
go test git.curoverse.com/arvados.git/services/keepstore
# install binary
go install git.curoverse.com/arvados.git/services/keepstore

Note one dependency that Go doesn't handle automatically: you must install keepstore before you expect keepproxy tests to pass.

go get git.curoverse.com/arvados.git/services/keepstore
go get -t git.curoverse.com/arvados.git/services/keepproxy
go test git.curoverse.com/arvados.git/services/keepproxy

Run Python tests

Install keepstore and keepproxy before running Python tests. Use a locally modified version (see above) or a let Go install them for you automatically -- whichever you want to test.

export GOPATH=~/go
go install git.curoverse.com/arvados.git/services/keepstore
go install git.curoverse.com/arvados.git/services/keepproxy
python setup.py test

TODO

If we want our import paths to look like "arvados.org/keepclient" then we just need to make sure "arvados.org/keepclient?go_get=1" shows the following meta tag:

<meta name="go-import" content="arvados.org/keepclient git https://git.curoverse.com/arvados.git/sdk/go/keepclient">

#3 Updated by Tom Clegg about 5 years ago

Testing everything from scratch

CI build should look something like this.

export GOPATH=/tmp/gopath
TESTME=/path/to/checked-out/arvados

# Make sure we test exactly the version we think we're testing
mkdir -p "$GOPATH/src/git.curoverse.com" 
ln -sfn "$TESTME" "$GOPATH/src/git.curoverse.com/arvados.git" 

# Make sure we don't accidentally use old binaries
rm -rf "$GOPATH/pkg" "$GOPATH/bin" 

# Note: keepstore must be installed in GOPATH in order for keepproxy's tests to pass.
for prog in keepstore keepproxy
do
  # Install $prog and its dependencies (including testing deps)
  go get -t git.curoverse.com/arvados.git/services/$prog
  # Run tests for $prog
  go test git.curoverse.com/arvados.git/services/$prog
done

# Run Python tests. They will use the $GOPATH/bin/keepstore and keepproxy we installed above.
...
$VENVDIR/bin/python setup.py test

Migrating from keep to keepstore

Perhaps: add a symlink (keepkeepstore) to the deb package; update servers; update run scripts to invoke keepstore directly; then remove the symlink from the deb package.

#4 Updated by Tom Clegg about 5 years ago

  • Status changed from New to In Progress

#5 Updated by Tom Clegg about 5 years ago

  • Target version set to 2014-08-27 Sprint

#6 Updated by Tom Clegg about 5 years ago

  • Category set to Keep
  • Assigned To set to Tom Clegg

#7 Updated by Brett Smith about 5 years ago

I agree completely that we should try to have our SDKs behave in a way that's familiar to developers of the language. Thanks for taking the time to do this cleanup. You get big points for eliminating the need for specific build scripts, and documenting how things come together.

Let's start with the easy stuff:

Running tests

run_test_server will crash with a KeyError if GOPATH isn't set. It will also manipulate PATH incorrectly if there's more than one directory in GOPATH. I suggest the following:

_go_binpaths = ':'.join(os.path.join(path, 'bin') for path in
                        os.environ.get('GOPATH', '').split(':'))
if _go_binpaths:
    os.environ['PATH'] = ':'.join([_go_binpaths, os.environ['PATH']])

True, if GOPATH isn't set, run_test_server probably won't find the Keep binaries. But an exception saying "keepstore not found" is probably the best error to throw, and we probably shouldn't require GOPATH to be set just in case the binaries are already in PATH (installed by Debian package)?

As a documentation point, it seems necessary to get and install all of the Keep packages before running any tests. At the very least, the keepproxy tests use run_test_server to invoke keepstore, so you can't go test keepproxy before you go install keepstore. I think you know this, but it bit me when I was copying and pasting your example commands.

Code changes

keepcliest/keepclient_test.go has some old-fashioned import paths in it:

brinstar % go get -t git.curoverse.com/arvados.git/sdk/go/keepclient
package arvados.org/sdk: unrecognized import path "arvados.org/sdk" 
package arvados.org/streamer: unrecognized import path "arvados.org/streamer" 

Docker

Your changes to setting file modes solves one problem (don't have to chmod in the Dockerfile, yay!) but creates another. I run with an 077 umask. Setting generated file modes based on the template mode means, for users like me, you'll copy the owner's executable bit correctly, but permissions will be all wrong for group and other.

I previously wrote code to generate a mode starting from the template's owner bits, copying the read and execute bits to group and other, and then setting it like you've done. During review that got tossed in favor of the simplicity of setting umask. Maybe you can resuscitate it by going through config.rb's git log.

Local development

Now we get to the not-so-easy part. Your comments describe a procedure to have Go get all its modules from local source. That seems to come with a huge caveat: Go will try very hard to work from an up-to-date master. This makes it difficult to have Go work from a different branch, or with local modifications.

I set up the symlinks like you described, wrote the patch for run_test_server (it came up early in the diff), then started trying out tests. First problem:

brinstar % go get -t git.curoverse.com/arvados.git/services/keepstore
# cd /home/brett/.local/go/src/git.curoverse.com/arvados.git; git checkout master
error: Your local changes to the following files would be overwritten by checkout:
        sdk/python/tests/run_test_server.py
Please, commit your changes or stash them before you can switch branches.
Aborting
package git.curoverse.com/arvados.git/services/arvadosclient: exit status 1

I'm on your branch. Go doesn't like that and tries to check out master. That's annoying, but fine, I'll play along. I stash my patch, check out master, then reset HEAD to the 3551 branch to try to trick it.

brinstar % go get -t git.curoverse.com/arvados.git/services/keepstore
# cd /home/brett/.local/go/src/git.curoverse.com/arvados.git; git pull --ff-only
fatal: Not possible to fast-forward, aborting.
package git.curoverse.com/arvados.git/services/keepstore: exit status 128

It's not enough to be on master; Go tries to pull the latest master. I managed to do the rest of my testing by going back to origin/master, merging the 3551 branch locally, and going from there. But things would've started breaking again if someone pushed to origin/master while I was working. I think this loses a lot of the convenience of working from a local development tree. Should I be doing something differently? Are there additional configuration knobs I can set to tell Go "accept my Git checkout in its current state as completely canonical?"

#8 Updated by Tom Clegg about 5 years ago

Brett Smith wrote:

run_test_server will crash with a KeyError if GOPATH isn't set. It will also manipulate PATH incorrectly if there's more than one directory in GOPATH. I suggest the following:

[...]

Yes, that looks better, thanks. I'll add that.

True, if GOPATH isn't set, run_test_server probably won't find the Keep binaries. But an exception saying "keepstore not found" is probably the best error to throw, and we probably shouldn't require GOPATH to be set just in case the binaries are already in PATH (installed by Debian package)?

Also a good point.

As a documentation point, it seems necessary to get and install all of the Keep packages before running any tests. At the very least, the keepproxy tests use run_test_server to invoke keepstore, so you can't go test keepproxy before you go install keepstore. I think you know this, but it bit me when I was copying and pasting your example commands.

Ah yes, I did discover this and document it in the "Running tests from scratch" script, but I didn't go back and update the earlier "look how rosy everything is" bit at "test local modifications". I'll make another note up there.

keepclient/keepclient_test.go has some old-fashioned import paths in it:

Indeed, will fix. (I also seem to have missed some renaming-sdk-to-arvadosclient opportunities.)

Your changes to setting file modes solves one problem (don't have to chmod in the Dockerfile, yay!) but creates another. I run with an 077 umask. Setting generated file modes based on the template mode means, for users like me, you'll copy the owner's executable bit correctly, but permissions will be all wrong for group and other.

I previously wrote code to generate a mode starting from the template's owner bits, copying the read and execute bits to group and other, and then setting it like you've done. During review that got tossed in favor of the simplicity of setting umask. Maybe you can resuscitate it by going through config.rb's git log.

Indeed, I forgot about umask. Copying rx from u to go feels a bit weird -- I'd rather just copy the exact bits of the file -- but git seems to ignore group/other permissions (I suppose otherwise it would have to habitually poke your umask in the eye, which would be rude) so we can't distribute "this file should be rw------" information via native file modes anyway.

Poking further, git also seems to ignore the u=w bit, so we could achieve the same result by doing "chmod -R a+rX */generated" in Makefile... right?

Now we get to the not-so-easy part. Your comments describe a procedure to have Go get all its modules from local source. That seems to come with a huge caveat: Go will try very hard to work from an up-to-date master. This makes it difficult to have Go work from a different branch, or with local modifications.

Hm, none of this stuff ever happened to me. I worked on my branch and go just did what I wanted (i.e., leave my source tree alone, build whatever's there). And yes, building from local modifications is absolutely essential, so let's figure out what magic button I pushed!

Shots in the dark:

  • What's your `go version`? Mine is go version go1.3 linux/amd64.
  • If different, does your `go help get` message say this, like mine does? "The -u flag instructs get to use the network to update the named packages and their dependencies. By default, get uses the network to check out missing packages but does not use it to look for updates to existing packages."

#9 Updated by Tom Clegg about 5 years ago

Updates at 32cecc7
  • Fixed paths in keepclient.
  • Made keepclient tests pass. (Uncovered and fixed an unrelated keepclient reader bug in the process, see 9ccff20)
  • Stop throwing away log messages from keepproxy/keepstore (when invoked from run_test_servers.py) and run_test_servers.py itself (when invoked from keepclient/keepproxy tests)
  • Set mode of generated files to either 0755 or 0644, depending on whether the template file is executable. Rationale: Git lets us express only "executable" and "not executable". Any other differences must be site-specific (e.g., umask), can't come from git, can't get recorded in git, and should therefore be ignored when generating docker images.

No further clues yet about why your Go insists on "master" while mine doesn't.

#10 Updated by Brett Smith about 5 years ago

Tom Clegg wrote:

Brett Smith wrote:

run_test_server will crash with a KeyError if GOPATH isn't set. It will also manipulate PATH incorrectly if there's more than one directory in GOPATH. I suggest the following:

[...]

Yes, that looks better, thanks. I'll add that.

Actually, my version still botches the empty GOPATH case. Try this instead:


if os.environ.get('GOPATH'):
    _go_binpaths = ':'.join(os.path.join(path, 'bin') for path in
                            os.environ['GOPATH'].split(':'))
    os.environ['PATH'] = ':'.join([_go_binpaths, os.environ['PATH']])

Indeed, I forgot about umask. Copying rx from u to go feels a bit weird -- I'd rather just copy the exact bits of the file -- but git seems to ignore group/other permissions (I suppose otherwise it would have to habitually poke your umask in the eye, which would be rude) so we can't distribute "this file should be rw------" information via native file modes anyway.

Poking further, git also seems to ignore the u=w bit, so we could achieve the same result by doing "chmod -R a+rX */generated" in Makefile... right?

I think that will also work, yeah. It looks like we're already doing that in a few individual cases (chmod 755 foo/generated/…).

Hm, none of this stuff ever happened to me. I worked on my branch and go just did what I wanted (i.e., leave my source tree alone, build whatever's there). And yes, building from local modifications is absolutely essential, so let's figure out what magic button I pushed!

Shots in the dark:

  • What's your `go version`? Mine is go version go1.3 linux/amd64.

1.2.2. I'll upgrade and see if that makes my issues go away.

  • If different, does your `go help get` message say this, like mine does? "The -u flag instructs get to use the network to update the named packages and their dependencies. By default, get uses the network to check out missing packages but does not use it to look for updates to existing packages."

Yes it does.

#11 Updated by Tom Clegg about 5 years ago

Brett Smith wrote:

Actually, my version still botches the empty GOPATH case. Try this instead:

Yeah, I caught that too. :) Why not "collect all three" -- here's what I committed:

if 'GOPATH' in os.environ:
    gopaths = os.environ['GOPATH'].split(':')
    gobins = [os.path.join(path, 'bin') for path in gopaths]
    os.environ['PATH'] = ':'.join(gobins) + ':' + os.environ['PATH']

I think that will also work, yeah. It looks like we're already doing that in a few individual cases (chmod 755 foo/generated/…).

Ooh, yeah. Removed a few that are no longer needed.

  • If different, does your `go help get` message say this, like mine does? "The -u flag instructs get to use the network to update the named packages and their dependencies. By default, get uses the network to check out missing packages but does not use it to look for updates to existing packages."

Yes it does.

Another thought: Did you try running "go get -u" at any point? I got that from the go docs but I don't think I actually tried it myself. Hopefully it doesn't do the thing ruby "bundle" does with the --vendor (?) flag: "you did that once, so I'll do that by default from now on"...?

#12 Updated by Tom Clegg about 5 years ago

At least superficially, go1.2 looks like it shouldn't do what it did to you. See get.go from version 1.2

downloadPackage(p) is the part that ends up contemplating "git checkout master".

        // Download if the package is missing, or update if we're using -u.
        if p.Dir == "" || *getU {
                // The actual download.
                stk.push(p.ImportPath)
                err := downloadPackage(p)

I guess at some point we could give up and say "requires Go 1.3" but I still feel like we're missing something else.

#13 Updated by Brett Smith about 5 years ago

Reviewing 2a4e0d9

  • I'm kind of curious why my Go misbehaved too, but I'm not sure it's worth spending more time to try to track that down. It's important that we be able to build and test from an arbitrary filesystem, and we've both demonstrated to ourselves that that's doable. I think we should debug further if someone else hits the same issue.
  • In config.rb, it shouldn't be necessary to chmod the output file after creation, because we set our own umask at the top of this section (and I think that's a more appropriate solution, given the nature of what this script does).
  • In run_test_server.py, is log_stream() necessary? subprocess.Popen gives children the parent's stdin/stdout/stderr unless specified otherwise, and I've seen the stderr of the Keep server when I run Python SDK tests in the past. I think the Go stderr copying is what we really needed here, and I think it's better to avoid dealing with logfiles if we can help it.
  • In the Go tests, is there a way we might reduce duplication of the Go code necessary to fire up different incantations of run_test_server.py? Even if it's just refactoring the two invocations into a function, I think that would be worthwhile. Sharing that function across all our Go tests would make me even happier, but I realize that might be more trouble than it's worth. (Can we cheat with a git symlink, like we do with run_test_server.py in the FUSE tests?)

Thanks again.

#14 Updated by Tom Clegg about 5 years ago

Brett Smith wrote:

  • I'm kind of curious why my Go misbehaved too, but I'm not sure it's worth spending more time to try to track that down. It's important that we be able to build and test from an arbitrary filesystem, and we've both demonstrated to ourselves that that's doable. I think we should debug further if someone else hits the same issue.

Agreed. Perhaps Jenkins will be the pudding in which our proof is found. :)

  • In config.rb, it shouldn't be necessary to chmod the output file after creation, because we set our own umask at the top of this section (and I think that's a more appropriate solution, given the nature of what this script does).

Oh, duh. Thanks for pointing that out. Removed.

  • In run_test_server.py, is log_stream() necessary? subprocess.Popen gives children the parent's stdin/stdout/stderr unless specified otherwise, and I've seen the stderr of the Keep server when I run Python SDK tests in the past. I think the Go stderr copying is what we really needed here, and I think it's better to avoid dealing with logfiles if we can help it.

Oh, I think I see what happened here. (I agree about logfiles.) I thought I was losing the logs because Popen's default stderr is None, so I fixed that. Then the logs were still missing, so I fixed the fact that Go was discarding stderr coming from Python. I never went back to question whether stderr=None had been totally irrelevant, which seems likely now that you point it out (or I read the Popen docs more carefully).

Reverted.

  • In the Go tests, is there a way we might reduce duplication of the Go code necessary to fire up different incantations of run_test_server.py? Even if it's just refactoring the two invocations into a function, I think that would be worthwhile. Sharing that function across all our Go tests would make me even happier, but I realize that might be more trouble than it's worth. (Can we cheat with a git symlink, like we do with run_test_server.py in the FUSE tests?)

I had this exact thought too -- there's a 2 x 2 matrix of nearly identical code -- but I feel like I've already expanded my Go horizons enough for this "source tree layout" story. :) Perhaps this can be part of a future cleanup phase?

3551-go-layout now at e4353b2

#15 Updated by Brett Smith about 5 years ago

Tom Clegg wrote:

  • In the Go tests, is there a way we might reduce duplication of the Go code necessary to fire up different incantations of run_test_server.py? Even if it's just refactoring the two invocations into a function, I think that would be worthwhile. Sharing that function across all our Go tests would make me even happier, but I realize that might be more trouble than it's worth. (Can we cheat with a git symlink, like we do with run_test_server.py in the FUSE tests?)

I had this exact thought too -- there's a 2 x 2 matrix of nearly identical code -- but I feel like I've already expanded my Go horizons enough for this "source tree layout" story. :) Perhaps this can be part of a future cleanup phase?

Works for me. I think this is good to merge. Let me know if you'd like help with the Jenkins bits. Thanks.

#16 Updated by Anonymous about 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:244159419c42341baeb388236ad29cc546b7eca1.

Also available in: Atom PDF