Project

General

Profile

Actions

Feature #8080

closed

arvbox development environment

Added by Peter Amstutz over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
01/06/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

A new from-scratch docker build of Arvados that actually works.


Subtasks 2 (0 open2 closed)

Task #8081: Review 8080-arvbox in arvados-devResolvedTom Clegg01/06/2016

Actions
Task #8133: Fix keep-web related testsResolvedTom Clegg01/06/2016

Actions
Actions #1

Updated by Peter Amstutz over 6 years ago

  • Status changed from New to In Progress
  • Story points set to 2.0
Actions #2

Updated by Peter Amstutz over 6 years ago

  • Assigned To set to Peter Amstutz
Actions #3

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz over 6 years ago

Most tests pass

Failures (4):
Fail: services/login-sync tests (2s)
Fail: services/fuse tests (220s)
Fail: services/keep-web tests (18s)
Fail: apps/workbench tests (1287s)
Leaving behind temp dirs in /tmp/tmp.P0jlJWskaV

I don't know why login-sync is failing.

FUSE is failing for me on a MagicDirectory test, but this test fails outside Docker as well.

Keep-web tests are failing catastrophically so I suspect there is a missing dependency.

Most of the failing workbench tests are keepweb related, so they are probably have the same root cause as keepweb.

Actions #5

Updated by Brett Smith over 6 years ago

  • Target version changed from 2016-01-06 sprint to 2016-01-20 Sprint
Actions #6

Updated by Brett Smith over 6 years ago

  • Story points changed from 2.0 to 1.0
Actions #7

Updated by Tom Clegg over 6 years ago

What's a good recipe for using/testing this?

I might be able to spot the keep-web and login-sync bugs, having started both of those things...

Actions #8

Updated by Peter Amstutz over 6 years ago

arvbox build
arvbox run-tests --only services/keep-web

(you can add --skip-install after the first run, of course)

Actions #9

Updated by Tom Clegg over 6 years ago

"build" looked like it worked, but it seems like I need to do something else I needed to do "arvbox run" before run-tests would run. Otherwise:

tom@nelle:~/src/arvados (master)$ ../arvados-dev/arvbox/bin/arvbox run-tests --skip-install --only services/keep-web
[...git clone stuff]
Checking connectivity... done.
4f5d1301a9e9d93747e1b2772919ac1ce74cdb1a406e8d4ede1a5663311c6537
Checking dependencies:
virtualenv: 1.11.6
go: go version go1.3.3 linux/amd64
gcc: gcc (Debian 4.9.2-10) 4.9.2
fuse.h: /usr/include/fuse/fuse.h
pyconfig.h: /usr/include/x86_64-linux-gnu/python2.7/pyconfig.h
nginx: nginx version: nginx/1.6.2
perl: This is perl 5, version 20, subversion 2 (v5.20.2) built for x86_64-linux-gnu-thread-multi
perl ExtUtils::MakeMaker: 6.98
perl JSON: 2.61
perl LWP: 6.08
perl Net::SSL: 2.85
gitolite: /usr/bin/gitolite
WORKSPACE=/usr/src/arvados
mkdir: cannot create directory '/var/lib/gems/ruby/2.1.0': No such file or directory
Leaving behind temp dirs in /tmp/tmp.5lRBC8qIWb
Fatal: can't create /var/lib/gems/ruby/2.1.0 (does /tmp/tmp.5lRBC8qIWb exist?) (encountered in main at /usr/src/arvados-dev/jenkins/run-tests.sh line 315)
Actions #10

Updated by Tom Clegg over 6 years ago

I pushed a couple of commits to 8080-arvbox
  • Fixed keep-web tests by not overriding blob_signing_key in the API server's test config
  • Fixed login-sync tests by adding fuse group in Dockerfile

Other feedback:

Revert 7c7c2f10db25a1af16a523677444c802c66c807b (cf. IRC) - "bundle package" helps us avoid Downloading The Internet. (Perhaps the comments below about GEM_HOME will help with whatever trouble you ran into that inspired this?)

For reasons of copyright and maintainability, gitolite.rc should say where it comes from and what's changed from the original. (Ideally we'd apply the changes instead of copying the whole file, but I understand that's painful when the config format is Perl. When we do write that script it should surely go somewhere we can use it in production setups too.)

Is there any reason not to download runit-docker from the internet while building the docker image, like phantomjs and other dependencies, instead of copypasting including it as a git subtree?

Please use quoting, especially for stuff like rm -rf $ARVBOX_DATA/var

There seems to be some confusion about the various gem paths (what a surprise!). My ~/.arvbox has
  • some gems like arvbox/gems/ruby/2.1.0/cache/oj-2.14.3.gem
  • some gems like arvbox/gems/ruby/2.1.0/.gem/ruby/2.1.0/cache/oj-2.14.3.gem

Perhaps this is related to "GEMHOME=/var/lib/gems/ruby/2.1.0". Normally, GEMHOME is in run-tests.sh's temp dir. Is there any particular reason to override GEMHOME, VENVDIR, etc., rather than passing something like --temp-dir /var/lib/arvados/run-tests and letting run-tests name its temp dirs with its usual convention?

In keep-setup.sh, there are two "read -rd $'\000' keepservice", but isn't the second version appropriate for both "create" and "update" cases?

Could we patch config/database.yml.example like we do at Hacking prerequisites instead of having another copy of the default stuff?

Now that we're in a docker container, we can use the same port numbers as the install docs, instead of adding new conventions:
  • api is 8000
  • sso is 8900
  • keep-web is 9002

If the "services" array in ready/run-service moves to common.sh (maybe as "service_ports"?), a lot of the other scripts could look at that array instead of having their own copies of the mysterious magic numbers.

Use existing program names (or package names) instead of new variants -- the proliferation of names is probably confusing enough already
  • keep-web, not keepweb
  • arv-git-httpd, not githttp
  • keepstore, not keep

How do the "sleep 1" in keep1/run-service and the "sleep 2" in keep-setup.sh work together? This looks like a "usually works, sometimes fails mysteriously" kind of thing. If this is how we avoid running concurrent "go get" procs in the same GOPATH or something like that, "flock" would probably be more reliable (and a bit faster).

You can do "killall -HUP keepproxy" after adding a keep service record, to make sure it finds out about new keepstore services as soon as they come up.

Using different storage backend dirs for the two keepstore services would probably be less confusing when using this to work on Keep.

In gitssh-setup "ssh -o stricthostkeychecking=no localhost" should probably have a command at the end like "true". Evidently running a login shell with stdin closed accomplishes the same thing, but "true" would be a bit less mysterious.

Fix whatever it is that prevents starting from scratch and doing "arvbox build; arvbox run-tests" from working (see note-9)

I think it would be helpful to make a wiki page with more information about how this should be expected to work, e.g.:
  • how to: bring up dev env, change some API server code, verify using your dev workbench and workstation browser that the change actually worked.
  • what do the env vars do (ARVBOX_CONTAINER etc)

README: fix path to ./bin/arvbox, point to wiki page for more details

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

Actions #11

Updated by Tom Clegg over 6 years ago

I like "set -eu" ... "set -o pipefail" might also be worthwhile?

Actions #12

Updated by Peter Amstutz over 6 years ago

Tom Clegg wrote:

I pushed a couple of commits to 8080-arvbox
  • Fixed keep-web tests by not overriding blob_signing_key in the API server's test config
  • Fixed login-sync tests by adding fuse group in Dockerfile

Other feedback:

Revert 7c7c2f10db25a1af16a523677444c802c66c807b (cf. IRC) - "bundle package" helps us avoid Downloading The Internet. (Perhaps the comments below about GEM_HOME will help with whatever trouble you ran into that inspired this?)

There's GEM_HOME and then we have our own GEMHOME that means something slightly different. I think it is fixed now.

For reasons of copyright and maintainability, gitolite.rc should say where it comes from and what's changed from the original. (Ideally we'd apply the changes instead of copying the whole file, but I understand that's painful when the config format is Perl. When we do write that script it should surely go somewhere we can use it in production setups too.)

Added a note.

Is there any reason not to download runit-docker from the internet while building the docker image, like phantomjs and other dependencies, instead of copypasting including it as a git subtree?

I ditched runit-docker in favor of a different program that accomplishes a similar task. This is fetched from github with "go get".

Please use quoting, especially for stuff like rm -rf $ARVBOX_DATA/var

Done.

There seems to be some confusion about the various gem paths (what a surprise!). My ~/.arvbox has
  • some gems like arvbox/gems/ruby/2.1.0/cache/oj-2.14.3.gem
  • some gems like arvbox/gems/ruby/2.1.0/.gem/ruby/2.1.0/cache/oj-2.14.3.gem

Perhaps this is related to "GEMHOME=/var/lib/gems/ruby/2.1.0". Normally, GEMHOME is in run-tests.sh's temp dir. Is there any particular reason to override GEMHOME, VENVDIR, etc., rather than passing something like --temp-dir /var/lib/arvados/run-tests and letting run-tests name its temp dirs with its usual convention?

The reason it doesn't use just --temp-dir is so that the gems are reused between running a development server and running the tests.

In keep-setup.sh, there are two "read -rd $'\000' keepservice", but isn't the second version appropriate for both "create" and "update" cases?

Fixed.

Could we patch config/database.yml.example like we do at Hacking prerequisites instead of having another copy of the default stuff?

Fixed.

Now that we're in a docker container, we can use the same port numbers as the install docs, instead of adding new conventions:
  • api is 8000
  • sso is 8900
  • keep-web is 9002

If the "services" array in ready/run-service moves to common.sh (maybe as "service_ports"?), a lot of the other scripts could look at that array instead of having their own copies of the mysterious magic numbers.

Good idea. Done.

Use existing program names (or package names) instead of new variants -- the proliferation of names is probably confusing enough already
  • keep-web, not keepweb
  • arv-git-httpd, not githttp
  • keepstore, not keep

Fixed.

How do the "sleep 1" in keep1/run-service and the "sleep 2" in keep-setup.sh work together? This looks like a "usually works, sometimes fails mysteriously" kind of thing. If this is how we avoid running concurrent "go get" procs in the same GOPATH or something like that, "flock" would probably be more reliable (and a bit faster).

Fixed.

You can do "killall -HUP keepproxy" after adding a keep service record, to make sure it finds out about new keepstore services as soon as they come up.

Done.

Using different storage backend dirs for the two keepstore services would probably be less confusing when using this to work on Keep.

Okay, I changed it, although storing everything twice is not ideal. The right thing to do would be to run with default replication=1, but I think there may be bugs with that still?

In gitssh-setup "ssh -o stricthostkeychecking=no localhost" should probably have a command at the end like "true". Evidently running a login shell with stdin closed accomplishes the same thing, but "true" would be a bit less mysterious.

This is a little wonky. Once gitolite is set up, if you run ssh -o stricthostkeychecking=no git@localhost true you'll get an error because "true" is not a gitolite command. You could do "true" on the first time through, but if it works why try to break it?

Fix whatever it is that prevents starting from scratch and doing "arvbox build; arvbox run-tests" from working (see note-9)

Done (along with lots of other refactoring).

I think it would be helpful to make a wiki page with more information about how this should be expected to work, e.g.:
  • how to: bring up dev env, change some API server code, verify using your dev workbench and workstation browser that the change actually worked.
  • what do the env vars do (ARVBOX_CONTAINER etc)

Will do.

README: fix path to ./bin/arvbox, point to wiki page for more details

Will do.

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

We could export ARVADOS_API_HOST but unless we export a superuser API token we don't know who the user is going to log in as.

Actions #13

Updated by Tom Clegg over 6 years ago

Peter Amstutz wrote:

There's GEM_HOME and then we have our own GEMHOME that means something slightly different. I think it is fixed now.

(Yeah, sorry, I meant "comments below about confusion between GEMHOME and GEM_HOME" there, not GEM_HOME.)

By "fixed" do you mean you can put back bundle package --all?

Is there any particular reason to override GEMHOME, VENVDIR, etc., rather than passing something like --temp-dir /var/lib/arvados/run-tests and letting run-tests name its temp dirs with its usual convention?

The reason it doesn't use just --temp-dir is so that the gems are reused between running a development server and running the tests.

Is GEM_HOME not enough to accomplish that? I don't think GEMHOME should be shared with the dev environment.

The confusion between GEMHOME and GEM_HOME is unfortunate but basically GEMHOME is "a fake HOME dir that run-tests tells gem tools about". It behaves somewhat like a virtualenv belonging to the test suite: run-tests can install its own packages there, remove gems that might get in the way like existing versions of arvados gems (which might be different despite having the same version number), etc. GEM_HOME is where the user would normally install gems, and is expected to be shared with other uses. If the goal here is to avoid downloading gems twice for dev+tests, I'd expect GEM_HOME to take care of it.

Normally bundle package --all puts all needed gems in (e.g.) services/api/vendor/cache/ and we run bundle install --local in order to install from there before resorting to the internet. Perhaps arvbox should be taking advantage of this too, instead of trying to share an "installed gems" directory?

Even if it is necessary to override GEMHOME, we should use --temp-dir, and not override VENVDIR and the other ones we don't need to override. That way we won't have to keep the two lists of tempdirs in sync manually (or out of sync in this case, what with tests-venv vs. VENVDIR).

Okay, I changed it, although storing everything twice is not ideal. The right thing to do would be to run with default replication=1, but I think there may be bugs with that still?

The "right thing" depends on what you're developing/testing, I suppose. replication=1 will probably be helpful when finding and fixing any of those replication=1 bugs...

In gitssh-setup "ssh -o stricthostkeychecking=no localhost" should probably have a command at the end like "true". Evidently running a login shell with stdin closed accomplishes the same thing, but "true" would be a bit less mysterious.

This is a little wonky. Once gitolite is set up, if you run ssh -o stricthostkeychecking=no git@localhost true you'll get an error because "true" is not a gitolite command. You could do "true" on the first time through, but if it works why try to break it?

Yeah, I didn't think of that. Since the goal is "less mysterious" I suppose a comment is a better way.

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

We could export ARVADOS_API_HOST but unless we export a superuser API token we don't know who the user is going to log in as.

I figured the "superuser token", labelled as such, would be handy.

Actions #14

Updated by Peter Amstutz over 6 years ago

Tom Clegg wrote:

The confusion between GEMHOME and GEM_HOME is unfortunate but basically GEMHOME is "a fake HOME dir that run-tests tells gem tools about". It behaves somewhat like a virtualenv belonging to the test suite: run-tests can install its own packages there, remove gems that might get in the way like existing versions of arvados gems (which might be different despite having the same version number), etc. GEM_HOME is where the user would normally install gems, and is expected to be shared with other uses. If the goal here is to avoid downloading gems twice for dev+tests, I'd expect GEM_HOME to take care of it.

Ok, now it uses --temp and only sets GEM_HOME. Seems to be behaving. Thanks for explaining that.

Okay, I changed it, although storing everything twice is not ideal. The right thing to do would be to run with default replication=1, but I think there may be bugs with that still?

The "right thing" depends on what you're developing/testing, I suppose. replication=1 will probably be helpful when finding and fixing any of those replication=1 bugs...

Here's a compromise, it runs two keep servers but with "default_collection_replication: 1" so correctly behaving clients only write the block once.

This is a little wonky. Once gitolite is set up, if you run ssh -o stricthostkeychecking=no git@localhost true you'll get an error because "true" is not a gitolite command. You could do "true" on the first time through, but if it works why try to break it?

Yeah, I didn't think of that. Since the goal is "less mysterious" I suppose a comment is a better way.

Added a comment.

Along with "Workbench is running at http://$localip", how about a set of "export ARVADOS_*=*"

We could export ARVADOS_API_HOST but unless we export a superuser API token we don't know who the user is going to log in as.

I figured the "superuser token", labelled as such, would be handy.

I feel like this is more like to just lead to confusion when people use the superuser token, then wonder why they can't find their data that's owned by the system user and not the user they logged in as.

Actions #15

Updated by Tom Clegg over 6 years ago

arvados-dev|c21c967 has "Please enter a commit message" etc. in its commit message. (I think git-hooks will force you to rebase/reword this. BTW do you know why git can't recognize its own comments?)

Even with the failing workbench+keep-web test(s) I think you should merge this. It doesn't interfere with anything, so at this point leaving it unmerged just makes it slightly less convenient to use...

Actions #16

Updated by Peter Amstutz over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados-dev|commit:c870f6167d6c30bd6ba3522114d162712b025705.

Actions

Also available in: Atom PDF