Feature #8080

arvbox development environment

Added by Peter Amstutz almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
01/06/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

A new from-scratch docker build of Arvados that actually works.


Subtasks

Task #8081: Review 8080-arvbox in arvados-devResolvedTom Clegg

Task #8133: Fix keep-web related testsResolvedTom Clegg

Associated revisions

Revision c870f616
Added by Peter Amstutz almost 5 years ago

Merge branch '8080-arvbox' closes #8080

Revision b408ce71 (diff)
Added by Peter Amstutz almost 5 years ago

Merge branch '8080-arvbox' closes #8080

Revision b408ce71 (diff)
Added by Peter Amstutz almost 5 years ago

Merge branch '8080-arvbox' closes #8080

Revision e78db692 (diff)
Added by Peter Amstutz almost 5 years ago

Merge branch '8080-arvbox' closes #8080

Revision fe516455 (diff)
Added by Peter Amstutz almost 5 years ago

Fix gitolite-shell path refs #8080

Revision fe516455 (diff)
Added by Peter Amstutz almost 5 years ago

Fix gitolite-shell path refs #8080

Revision b7be9ca8 (diff)
Added by Peter Amstutz almost 5 years ago

Fix gitolite-shell path refs #8080

Revision 026d7048 (diff)
Added by Peter Amstutz almost 5 years ago

Enable arvbox user to sudo to crunch user. refs #8080

Revision 026d7048 (diff)
Added by Peter Amstutz almost 5 years ago

Enable arvbox user to sudo to crunch user. refs #8080

Revision bbc1ea28 (diff)
Added by Peter Amstutz almost 5 years ago

Enable arvbox user to sudo to crunch user. refs #8080

Revision 7b2fd8f0 (diff)
Added by Peter Amstutz almost 5 years ago

Document some arvbox hw/sw requirements refs #8080

Revision 7b2fd8f0 (diff)
Added by Peter Amstutz almost 5 years ago

Document some arvbox hw/sw requirements refs #8080

Revision 4843665b (diff)
Added by Peter Amstutz almost 5 years ago

Document some arvbox hw/sw requirements refs #8080

Revision 9a0aa9fc (diff)
Added by Peter Amstutz almost 5 years ago

Just create arvbox superuser instead of creating a whole useless database.
Check directly whether creating the arvbox is required. refs #8080

Revision 9a0aa9fc (diff)
Added by Peter Amstutz almost 5 years ago

Just create arvbox superuser instead of creating a whole useless database.
Check directly whether creating the arvbox is required. refs #8080

Revision 486acacd (diff)
Added by Peter Amstutz almost 5 years ago

Just create arvbox superuser instead of creating a whole useless database.
Check directly whether creating the arvbox is required. refs #8080

Revision 738bac6d (diff)
Added by Peter Amstutz almost 5 years ago

Add "status" command, refs #8080

Revision 738bac6d (diff)
Added by Peter Amstutz almost 5 years ago

Add "status" command, refs #8080

Revision b99400a6 (diff)
Added by Peter Amstutz almost 5 years ago

Add "status" command, refs #8080

Revision 0a18c5fb (diff)
Added by Peter Amstutz almost 5 years ago

Fix markdown for arvbox README.md refs #8080

Revision 0a18c5fb (diff)
Added by Peter Amstutz almost 5 years ago

Fix markdown for arvbox README.md refs #8080

Revision 0bf166a6 (diff)
Added by Peter Amstutz almost 5 years ago

Fix markdown for arvbox README.md refs #8080

Revision 42f2e4db (diff)
Added by Peter Amstutz almost 5 years ago

chown /etc/ssl/private to arvbox refs #8080

Revision 42f2e4db (diff)
Added by Peter Amstutz almost 5 years ago

chown /etc/ssl/private to arvbox refs #8080

Revision 98006086 (diff)
Added by Peter Amstutz almost 5 years ago

chown /etc/ssl/private to arvbox refs #8080

Revision a8655074 (diff)
Added by Peter Amstutz almost 5 years ago

Arvbox fixes: use 'postgres' database when testing if a user exists. Copy
gitolite.rc to the right place. Print out what is being deleted when using
"reset" or "destroy". refs #8080

Revision a8655074 (diff)
Added by Peter Amstutz almost 5 years ago

Arvbox fixes: use 'postgres' database when testing if a user exists. Copy
gitolite.rc to the right place. Print out what is being deleted when using
"reset" or "destroy". refs #8080

Revision 80438258 (diff)
Added by Peter Amstutz almost 5 years ago

Arvbox fixes: use 'postgres' database when testing if a user exists. Copy
gitolite.rc to the right place. Print out what is being deleted when using
"reset" or "destroy". refs #8080

Revision a1ca4978 (diff)
Added by Peter Amstutz almost 5 years ago

Use volumes for /var/log/nginx and /etc/ssl/private so that changing ownership
sticks. refs #8080

Revision a1ca4978 (diff)
Added by Peter Amstutz almost 5 years ago

Use volumes for /var/log/nginx and /etc/ssl/private so that changing ownership
sticks. refs #8080

Revision 290dfcf6 (diff)
Added by Peter Amstutz almost 5 years ago

Use volumes for /var/log/nginx and /etc/ssl/private so that changing ownership
sticks. refs #8080

History

#1 Updated by Peter Amstutz almost 5 years ago

  • Status changed from New to In Progress
  • Story points set to 2.0

#2 Updated by Peter Amstutz almost 5 years ago

  • Assigned To set to Peter Amstutz

#3 Updated by Peter Amstutz almost 5 years ago

  • Description updated (diff)

#4 Updated by Peter Amstutz almost 5 years ago

Most tests pass

Failures (4):
Fail: services/login-sync tests (2s)
Fail: services/fuse tests (220s)
Fail: services/keep-web tests (18s)
Fail: apps/workbench tests (1287s)
Leaving behind temp dirs in /tmp/tmp.P0jlJWskaV

I don't know why login-sync is failing.

FUSE is failing for me on a MagicDirectory test, but this test fails outside Docker as well.

Keep-web tests are failing catastrophically so I suspect there is a missing dependency.

Most of the failing workbench tests are keepweb related, so they are probably have the same root cause as keepweb.

#5 Updated by Brett Smith almost 5 years ago

  • Target version changed from 2016-01-06 sprint to 2016-01-20 Sprint

#6 Updated by Brett Smith almost 5 years ago

  • Story points changed from 2.0 to 1.0

#7 Updated by Tom Clegg almost 5 years ago

What's a good recipe for using/testing this?

I might be able to spot the keep-web and login-sync bugs, having started both of those things...

#8 Updated by Peter Amstutz almost 5 years ago

arvbox build
arvbox run-tests --only services/keep-web

(you can add --skip-install after the first run, of course)

#9 Updated by Tom Clegg almost 5 years ago

"build" looked like it worked, but it seems like I need to do something else I needed to do "arvbox run" before run-tests would run. Otherwise:

tom@nelle:~/src/arvados (master)$ ../arvados-dev/arvbox/bin/arvbox run-tests --skip-install --only services/keep-web
[...git clone stuff]
Checking connectivity... done.
4f5d1301a9e9d93747e1b2772919ac1ce74cdb1a406e8d4ede1a5663311c6537
Checking dependencies:
virtualenv: 1.11.6
go: go version go1.3.3 linux/amd64
gcc: gcc (Debian 4.9.2-10) 4.9.2
fuse.h: /usr/include/fuse/fuse.h
pyconfig.h: /usr/include/x86_64-linux-gnu/python2.7/pyconfig.h
nginx: nginx version: nginx/1.6.2
perl: This is perl 5, version 20, subversion 2 (v5.20.2) built for x86_64-linux-gnu-thread-multi
perl ExtUtils::MakeMaker: 6.98
perl JSON: 2.61
perl LWP: 6.08
perl Net::SSL: 2.85
gitolite: /usr/bin/gitolite
WORKSPACE=/usr/src/arvados
mkdir: cannot create directory '/var/lib/gems/ruby/2.1.0': No such file or directory
Leaving behind temp dirs in /tmp/tmp.5lRBC8qIWb
Fatal: can't create /var/lib/gems/ruby/2.1.0 (does /tmp/tmp.5lRBC8qIWb exist?) (encountered in main at /usr/src/arvados-dev/jenkins/run-tests.sh line 315)

#10 Updated by Tom Clegg almost 5 years ago

I pushed a couple of commits to 8080-arvbox
  • Fixed keep-web tests by not overriding blob_signing_key in the API server's test config
  • Fixed login-sync tests by adding fuse group in Dockerfile

Other feedback:

Revert 7c7c2f10db25a1af16a523677444c802c66c807b (cf. IRC) - "bundle package" helps us avoid Downloading The Internet. (Perhaps the comments below about GEM_HOME will help with whatever trouble you ran into that inspired this?)

For reasons of copyright and maintainability, gitolite.rc should say where it comes from and what's changed from the original. (Ideally we'd apply the changes instead of copying the whole file, but I understand that's painful when the config format is Perl. When we do write that script it should surely go somewhere we can use it in production setups too.)

Is there any reason not to download runit-docker from the internet while building the docker image, like phantomjs and other dependencies, instead of copypasting including it as a git subtree?

Please use quoting, especially for stuff like rm -rf $ARVBOX_DATA/var

There seems to be some confusion about the various gem paths (what a surprise!). My ~/.arvbox has
  • some gems like arvbox/gems/ruby/2.1.0/cache/oj-2.14.3.gem
  • some gems like arvbox/gems/ruby/2.1.0/.gem/ruby/2.1.0/cache/oj-2.14.3.gem

Perhaps this is related to "GEMHOME=/var/lib/gems/ruby/2.1.0". Normally, GEMHOME is in run-tests.sh's temp dir. Is there any particular reason to override GEMHOME, VENVDIR, etc., rather than passing something like --temp-dir /var/lib/arvados/run-tests and letting run-tests name its temp dirs with its usual convention?

In keep-setup.sh, there are two "read -rd $'\000' keepservice", but isn't the second version appropriate for both "create" and "update" cases?

Could we patch config/database.yml.example like we do at Hacking prerequisites instead of having another copy of the default stuff?

Now that we're in a docker container, we can use the same port numbers as the install docs, instead of adding new conventions:
  • api is 8000
  • sso is 8900
  • keep-web is 9002

If the "services" array in ready/run-service moves to common.sh (maybe as "service_ports"?), a lot of the other scripts could look at that array instead of having their own copies of the mysterious magic numbers.

Use existing program names (or package names) instead of new variants -- the proliferation of names is probably confusing enough already
  • keep-web, not keepweb
  • arv-git-httpd, not githttp
  • keepstore, not keep

How do the "sleep 1" in keep1/run-service and the "sleep 2" in keep-setup.sh work together? This looks like a "usually works, sometimes fails mysteriously" kind of thing. If this is how we avoid running concurrent "go get" procs in the same GOPATH or something like that, "flock" would probably be more reliable (and a bit faster).

You can do "killall -HUP keepproxy" after adding a keep service record, to make sure it finds out about new keepstore services as soon as they come up.

Using different storage backend dirs for the two keepstore services would probably be less confusing when using this to work on Keep.

In gitssh-setup "ssh -o stricthostkeychecking=no localhost" should probably have a command at the end like "true". Evidently running a login shell with stdin closed accomplishes the same thing, but "true" would be a bit less mysterious.

Fix whatever it is that prevents starting from scratch and doing "arvbox build; arvbox run-tests" from working (see note-9)

I think it would be helpful to make a wiki page with more information about how this should be expected to work, e.g.:
  • how to: bring up dev env, change some API server code, verify using your dev workbench and workstation browser that the change actually worked.
  • what do the env vars do (ARVBOX_CONTAINER etc)

README: fix path to ./bin/arvbox, point to wiki page for more details

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

#11 Updated by Tom Clegg almost 5 years ago

I like "set -eu" ... "set -o pipefail" might also be worthwhile?

#12 Updated by Peter Amstutz almost 5 years ago

Tom Clegg wrote:

I pushed a couple of commits to 8080-arvbox
  • Fixed keep-web tests by not overriding blob_signing_key in the API server's test config
  • Fixed login-sync tests by adding fuse group in Dockerfile

Other feedback:

Revert 7c7c2f10db25a1af16a523677444c802c66c807b (cf. IRC) - "bundle package" helps us avoid Downloading The Internet. (Perhaps the comments below about GEM_HOME will help with whatever trouble you ran into that inspired this?)

There's GEM_HOME and then we have our own GEMHOME that means something slightly different. I think it is fixed now.

For reasons of copyright and maintainability, gitolite.rc should say where it comes from and what's changed from the original. (Ideally we'd apply the changes instead of copying the whole file, but I understand that's painful when the config format is Perl. When we do write that script it should surely go somewhere we can use it in production setups too.)

Added a note.

Is there any reason not to download runit-docker from the internet while building the docker image, like phantomjs and other dependencies, instead of copypasting including it as a git subtree?

I ditched runit-docker in favor of a different program that accomplishes a similar task. This is fetched from github with "go get".

Please use quoting, especially for stuff like rm -rf $ARVBOX_DATA/var

Done.

There seems to be some confusion about the various gem paths (what a surprise!). My ~/.arvbox has
  • some gems like arvbox/gems/ruby/2.1.0/cache/oj-2.14.3.gem
  • some gems like arvbox/gems/ruby/2.1.0/.gem/ruby/2.1.0/cache/oj-2.14.3.gem

Perhaps this is related to "GEMHOME=/var/lib/gems/ruby/2.1.0". Normally, GEMHOME is in run-tests.sh's temp dir. Is there any particular reason to override GEMHOME, VENVDIR, etc., rather than passing something like --temp-dir /var/lib/arvados/run-tests and letting run-tests name its temp dirs with its usual convention?

The reason it doesn't use just --temp-dir is so that the gems are reused between running a development server and running the tests.

In keep-setup.sh, there are two "read -rd $'\000' keepservice", but isn't the second version appropriate for both "create" and "update" cases?

Fixed.

Could we patch config/database.yml.example like we do at Hacking prerequisites instead of having another copy of the default stuff?

Fixed.

Now that we're in a docker container, we can use the same port numbers as the install docs, instead of adding new conventions:
  • api is 8000
  • sso is 8900
  • keep-web is 9002

If the "services" array in ready/run-service moves to common.sh (maybe as "service_ports"?), a lot of the other scripts could look at that array instead of having their own copies of the mysterious magic numbers.

Good idea. Done.

Use existing program names (or package names) instead of new variants -- the proliferation of names is probably confusing enough already
  • keep-web, not keepweb
  • arv-git-httpd, not githttp
  • keepstore, not keep

Fixed.

How do the "sleep 1" in keep1/run-service and the "sleep 2" in keep-setup.sh work together? This looks like a "usually works, sometimes fails mysteriously" kind of thing. If this is how we avoid running concurrent "go get" procs in the same GOPATH or something like that, "flock" would probably be more reliable (and a bit faster).

Fixed.

You can do "killall -HUP keepproxy" after adding a keep service record, to make sure it finds out about new keepstore services as soon as they come up.

Done.

Using different storage backend dirs for the two keepstore services would probably be less confusing when using this to work on Keep.

Okay, I changed it, although storing everything twice is not ideal. The right thing to do would be to run with default replication=1, but I think there may be bugs with that still?

In gitssh-setup "ssh -o stricthostkeychecking=no localhost" should probably have a command at the end like "true". Evidently running a login shell with stdin closed accomplishes the same thing, but "true" would be a bit less mysterious.

This is a little wonky. Once gitolite is set up, if you run ssh -o stricthostkeychecking=no git@localhost true you'll get an error because "true" is not a gitolite command. You could do "true" on the first time through, but if it works why try to break it?

Fix whatever it is that prevents starting from scratch and doing "arvbox build; arvbox run-tests" from working (see note-9)

Done (along with lots of other refactoring).

I think it would be helpful to make a wiki page with more information about how this should be expected to work, e.g.:
  • how to: bring up dev env, change some API server code, verify using your dev workbench and workstation browser that the change actually worked.
  • what do the env vars do (ARVBOX_CONTAINER etc)

Will do.

README: fix path to ./bin/arvbox, point to wiki page for more details

Will do.

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

We could export ARVADOS_API_HOST but unless we export a superuser API token we don't know who the user is going to log in as.

#13 Updated by Tom Clegg almost 5 years ago

Peter Amstutz wrote:

There's GEM_HOME and then we have our own GEMHOME that means something slightly different. I think it is fixed now.

(Yeah, sorry, I meant "comments below about confusion between GEMHOME and GEM_HOME" there, not GEM_HOME.)

By "fixed" do you mean you can put back bundle package --all?

Is there any particular reason to override GEMHOME, VENVDIR, etc., rather than passing something like --temp-dir /var/lib/arvados/run-tests and letting run-tests name its temp dirs with its usual convention?

The reason it doesn't use just --temp-dir is so that the gems are reused between running a development server and running the tests.

Is GEM_HOME not enough to accomplish that? I don't think GEMHOME should be shared with the dev environment.

The confusion between GEMHOME and GEM_HOME is unfortunate but basically GEMHOME is "a fake HOME dir that run-tests tells gem tools about". It behaves somewhat like a virtualenv belonging to the test suite: run-tests can install its own packages there, remove gems that might get in the way like existing versions of arvados gems (which might be different despite having the same version number), etc. GEM_HOME is where the user would normally install gems, and is expected to be shared with other uses. If the goal here is to avoid downloading gems twice for dev+tests, I'd expect GEM_HOME to take care of it.

Normally bundle package --all puts all needed gems in (e.g.) services/api/vendor/cache/ and we run bundle install --local in order to install from there before resorting to the internet. Perhaps arvbox should be taking advantage of this too, instead of trying to share an "installed gems" directory?

Even if it is necessary to override GEMHOME, we should use --temp-dir, and not override VENVDIR and the other ones we don't need to override. That way we won't have to keep the two lists of tempdirs in sync manually (or out of sync in this case, what with tests-venv vs. VENVDIR).

Okay, I changed it, although storing everything twice is not ideal. The right thing to do would be to run with default replication=1, but I think there may be bugs with that still?

The "right thing" depends on what you're developing/testing, I suppose. replication=1 will probably be helpful when finding and fixing any of those replication=1 bugs...

In gitssh-setup "ssh -o stricthostkeychecking=no localhost" should probably have a command at the end like "true". Evidently running a login shell with stdin closed accomplishes the same thing, but "true" would be a bit less mysterious.

This is a little wonky. Once gitolite is set up, if you run ssh -o stricthostkeychecking=no git@localhost true you'll get an error because "true" is not a gitolite command. You could do "true" on the first time through, but if it works why try to break it?

Yeah, I didn't think of that. Since the goal is "less mysterious" I suppose a comment is a better way.

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

We could export ARVADOS_API_HOST but unless we export a superuser API token we don't know who the user is going to log in as.

I figured the "superuser token", labelled as such, would be handy.

#14 Updated by Peter Amstutz almost 5 years ago

Tom Clegg wrote:

The confusion between GEMHOME and GEM_HOME is unfortunate but basically GEMHOME is "a fake HOME dir that run-tests tells gem tools about". It behaves somewhat like a virtualenv belonging to the test suite: run-tests can install its own packages there, remove gems that might get in the way like existing versions of arvados gems (which might be different despite having the same version number), etc. GEM_HOME is where the user would normally install gems, and is expected to be shared with other uses. If the goal here is to avoid downloading gems twice for dev+tests, I'd expect GEM_HOME to take care of it.

Ok, now it uses --temp and only sets GEM_HOME. Seems to be behaving. Thanks for explaining that.

Okay, I changed it, although storing everything twice is not ideal. The right thing to do would be to run with default replication=1, but I think there may be bugs with that still?

The "right thing" depends on what you're developing/testing, I suppose. replication=1 will probably be helpful when finding and fixing any of those replication=1 bugs...

Here's a compromise, it runs two keep servers but with "default_collection_replication: 1" so correctly behaving clients only write the block once.

This is a little wonky. Once gitolite is set up, if you run ssh -o stricthostkeychecking=no git@localhost true you'll get an error because "true" is not a gitolite command. You could do "true" on the first time through, but if it works why try to break it?

Yeah, I didn't think of that. Since the goal is "less mysterious" I suppose a comment is a better way.

Added a comment.

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

We could export ARVADOS_API_HOST but unless we export a superuser API token we don't know who the user is going to log in as.

I figured the "superuser token", labelled as such, would be handy.

I feel like this is more like to just lead to confusion when people use the superuser token, then wonder why they can't find their data that's owned by the system user and not the user they logged in as.

#15 Updated by Tom Clegg almost 5 years ago

arvados-dev|c21c967 has "Please enter a commit message" etc. in its commit message. (I think git-hooks will force you to rebase/reword this. BTW do you know why git can't recognize its own comments?)

Even with the failing workbench+keep-web test(s) I think you should merge this. It doesn't interfere with anything, so at this point leaving it unmerged just makes it slightly less convenient to use...

#16 Updated by Peter Amstutz almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados-dev|commit:c870f6167d6c30bd6ba3522114d162712b025705.

Also available in: Atom PDF