Idea #17344
closed[boot] Make arvados-server-easy package suitable for demo use case
Added by Tom Clegg almost 4 years ago. Updated almost 2 years ago.
Description
Install arv-mount so a-d-c loopback driver can use itAvoid leaving system in inconvenient state ifarvados-server init
doesn't go wellSave a docker image (alpine linux? hello world?) during "init", and use it instead of arvados/jobs in diagnosticsDocument firewall / accessible port requirementsSanity-check dns/firewall early inarvados-server init
Remove setup roadblocks (e.g., use PAM instead of Google API keys)- Fix internal/external client detection so remote clients don't try to connect to keepstore at 0.0.0.0:9010
Link "next steps" section to relevant doc pagesAdd "make an admin user" to next stepsReview/remove obsolete package dependencies (libpython2.7, *-dev?)
Files
dispatch-cloud.log (44.4 KB) dispatch-cloud.log | Lucas Di Pentima, 09/06/2022 03:48 PM |
Related issues
Updated by Tom Clegg almost 4 years ago
- Related to Idea #16306: [install] Build all-in-one server package using arvados-server install/boot in production mode added
Updated by Tom Clegg almost 4 years ago
- Related to Idea #15941: arvados-boot added
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
Arvados Future Sprints)
Updated by Tom Clegg over 2 years ago
- Related to Idea #18337: Easy entry into Arvados ecosystem added
Updated by Tom Clegg over 2 years ago
- Status changed from New to In Progress
- Description updated (diff)
Updated by Tom Clegg over 2 years ago
17344-easy-demo @ 4a9acc11eaba55a152851256d19c8fdad3b9f863 -- developer-run-tests: #3231
Updated by Lucas Di Pentima over 2 years ago
Sorry for the delay, here're some comments:
- The ticket mentions a "demo" mode, is the "single-host production" auto install also the demo? I think the "demo mode" could be configured to set the first user as an admin, and also auto-activate new users.
- Could we add the postgresql & docker.io packages as dependencies so it gets auto-installed when necessary? If we aim to do a single node install, those dependencies are needed on the same host, or do you think of another possibility?
- In
lib/install/deps.go:L647
Do you think we could use a dynamic amount of parallel jobs depending on the available cpu cores? I think it would be beneficial if we then decide to use a high CPU worker for the package build pipeline. - Question: The version number selected for the package is "2.1.0", is this due to the branch being created from 16652's branch that was started on March?
- While thinking about ways how we can get the diagnostics tool to be usable anywhere, I thought about 2 ideas:
- Given that the alpine docker image is so small (5.6 MB) we could somehow embed it on our
arvados-client
so that it can upload it to keep if necessary. - If we don't want binary blobs inside our own binary, we could use a tool like skopeo (https://github.com/containers/skopeo) to download it to the local filesystem instead of needing the docker daemon.
- Although it's a interesting project, I guess having to install it (and its dependencies) would be as annoying as installing docker to get the same effect? Not sure if it can be used as a library just for the purpose of downloading docker images from the registry.
- Given that the alpine docker image is so small (5.6 MB) we could somehow embed it on our
- In
lib/install/init.go:L118-125
, shouldn't be better to iterate over a list of port numbers? AFAICT, if ports 4440 & 443 are already taken, the current code doesn't fail. - After initialization, the message is: "Setup complete, you can access wb at xxxx"... do you think it would be useful to also suggest the admin to do a diagnostics run? Or maybe execute it automatically before the "setup complete" message?
- The docs say that the user should be setup by username, but when I tried I got this (lack of
--user
on the docs' example):root@debian-s-4vcpu-8gb-nyc3-01:~# arv sudo user setup lucas Top level ::CompositeIO is deprecated, require 'multipart/post' and use `Multipart::Post::CompositeReadIO` instead! Top level ::Parts is deprecated, require 'multipart/post' and use `Multipart::Post::Parts` instead! Error: //railsapi.internal/arvados/v1/users/setup: 422 Unprocessable Entity: #<ArgumentError: Required uuid or user> (req-1s86olhxcvhrlap8h424)
- The initial user wasn't set up as an admin user, so I think the docs could also say how to set a user as admin via the CLI?
- In the docs section about customizing the cluster, maybe we can have some of those bulletpoints linked to sections of the documentation about manual install/config?
Updated by Tom Clegg over 2 years ago
I haven't been thinking of this as a separate "demo mode" per se -- rather, getting the single-node production install far enough along to use as a demo, but not necessarily functional enough to recommend for production yet (e.g., doesn't handle database migrations yet).
Activation/admin setup could definitely be made smoother/easier. If possible I'd like to solve this in the secure/private case, rather than lean on insecure/open settings for the sake of convenience. Ideas:- make a command more like
arv sudo user setup [--admin] $username
- make
arv sudo user setup $username
work even if it's run before the user's first login (we made a system for this so we could pre-approve people based on their Google account address, but I'm not sure whether it works in the PAM case) - option to auto-activate + auto-admin when using PAM and user is in a specified group (like "sudo" or "adm")
- an
arv sudo ...
command that [creates a new user] and prints ahttps://wb2/token?api_token=...
link to log you in right away
postgresql & docker.io packages as dependencies so it gets auto-installed when necessary
Both postgresql server and docker daemon seem a bit much to install where they're not needed. Depending on how you define "single-node install", postgresql server might be on a different host, or a cloud service. Docker isn't needed on server nodes in normal usage, only for the sake of diagnostics. (Also, although we're not there yet, my intent is to make a multi-node cluster something like "on each host, install arvados-server-easy, then do this "join" command".)
I was even wondering if we can remove the gitolite dependency (and its annoying interactive prompt during package install) and automatically disabling the git features if it's not installed.
How about making the install instructions say "apt install postgresql docker.io arvados-server-easy", with notes about omitting them (or removing them afterward) if not needed?
dynamic amount of parallel jobs depending on the available cpu cores
Oh yeah, good catch. Done.
version number selected for the package is "2.1.0"
Yes, it uses the same rules as the existing package scripts: for real published packages the caller should be specifying the version (arvados-package build -package-version=2.4.1
), otherwise we use source:build/version-at-commit.sh to guess something based on the git history.
embed alpine docker image
Hm, I kinda like this idea. Is there an even lighter image that would be useful for testing? It really doesn't need to do much. Yes! there is https://hub.docker.com/_/hello-world -- "docker save" makes a 24064 byte .tar file.
better to iterate over a list of port numbers? AFAICT, if ports 4440 & 443 are already taken, the current code doesn't fail
Oops, yes. Fixed. And now it tests all of 4440-4460, not just 4440.
suggest the admin to do a diagnostics run? Or maybe execute it automatically before the "setup complete" message?
Suggesting seems good -- added. I'm not sure about doing it automatically. I like the idea of teaching the user to use 'arv sudo diagnostics' themself early in the game.
arv sudo user setup lucas
Oh yeah. I wrote that in the docs because it would be nice if it really looked that way. Currently I think you need to say --uuid {paste_uuid_here}
and getting the UUID was too annoying to document.
In the docs section about customizing the cluster, maybe we can have some of those bulletpoints linked
Added some links. The existing doc pages aren't exactly right for this context (e.g., telling you to install arvados-dispatch-cloud) but it's a start.
17344-easy-demo @ a2d23c038780134c812249e74d9e6d1b7cad69b6 -- developer-run-tests: #3240
Updated by Tom Clegg over 2 years ago
17344-easy-demo @ d15f485909cf84aeda62c0a843f384cb218e0125 -- developer-run-tests: #3241
Removes some dev-only/outdated package dependencies
Updated by Tom Clegg over 2 years ago
17344-easy-demo @ c966970d64c21d7adaf1c3c8b737aa9e7c166f0e
Adds -create-db=false
option, with connection info accepted from POSTGRES_HOST/USER/DB/PASSWORD env vars
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-07-20 to 2022-08-03 Sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-03 Sprint to 2022-08-17 sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-08-17 sprint to 2022-08-31 sprint
Updated by Tom Clegg about 2 years ago
17344-easy-demo @ 19c5342a76ab9474c3c8eb5c0e7903c58203a055 -- developer-run-tests: #3273
This makes remote upload/download work in a cloud VM demo install:- wb2 client is recognized as external, so controller tells it to use keepproxy
- keepproxy is recognized as internal, so controller tells it to use keepstore
- Autocert requires the external controller hostname to resolve to a publicly routable IP address that lands on the controller host.
- The publicly routable IP address is not bound to a local interface: when the controller host itself connects to the external URL, traffic goes through an external gateway, and the remote address seen by Nginx is not recognizable as a local address.
- Even if we could identify it (e.g., x-forwarded-for), we don't want server-to-server traffic going through that external gateway anyway.
The current/old deployment strategy is to fix this with split-horizon DNS. That's not a suitable approach for an easy-install / quick demo scenario.
The solution here is to have "inside" clients (i.e., server components) connect to the server's network interface rather than resolving the external URL host, but validate the presented TLS certificate based on the external URL host. A new env var "ARVADOS_SERVER_ADDRESS" indicates the address clients should connect to.
To follow through with this, we also need to support ARVADOS_SERVER_ADDRESS in Python and Ruby SDKs, so arv-mount (on shell node, worker VM), ruby arv cli (on server node, shell node), and workbench1 work without split horizon DNS or unnecessary routing through the public IP gateway.
As this branch stands so far, only Go services know how to do this, which is the minimum we need in that keepproxy outright breaks without it.
Updated by Tom Clegg about 2 years ago
17344-easy-demo @ 124e87dbafd6c04c9937f45e90f2662c715bea90 -- developer-run-tests: #3275
Disables an sdk/python keepclient test that relies on sending the X-External-Client header to persuade controller to treat it as an external client and send keepproxy info instead of keepstore info, all so it can test that the discovery code notices and sets the using_proxy flag.
We have this ARVADOS_EXTERNAL_CLIENT env var / settings entry that causes the Python client to set that header. But on a real cluster (and arvbox), Nginx deletes the client-provided header and replaces it with 0 or 1 depending on the remote IP address. So ARVADOS_EXTERNAL_CLIENT has only ever worked in the test suite.
Since ARVADOS_EXTERNAL_CLIENT only works in the test suite, and this is the only test that fails if we ignore it, I'm thinking we should rip out all the ARVADOS_EXTERNAL_CLIENT stuff, and rewrite this test so it mocks a keep_services/accessible response, instead of convincing the whole nginx/controller/rails stack to return a proxy.
Updated by Tom Clegg about 2 years ago
17344-easy-demo @ 3d99c1541a450411a847c1c2b87721a4c51b484e -- developer-run-tests: #3276
Replace the using_proxy test with one that uses a mock.
Updated by Tom Clegg about 2 years ago
- removes ARVADOS_EXTERNAL_CLIENT
Updated by Tom Clegg about 2 years ago
- Target version changed from 2022-08-31 sprint to 2022-09-14 sprint
Updated by Lucas Di Pentima about 2 years ago
- File dispatch-cloud.log dispatch-cloud.log added
I have been testing the new branch on a freshly created DigitalOcean droplet, the diagnostics run only failed on the test container run.
Attached is the dispatch-cloud
part of the logs, just in case there's some clue of what was going on.
I'll retry with another VPS just in case I did something wrong.
Updated by Lucas Di Pentima about 2 years ago
I've retried everything with a new VPS and installing Postgresql & docker before initializing the cluster. It worked great!
Updated by Tom Clegg about 2 years ago
I suspect crunch-run set the "broken node" flag. Real drivers fix it by destroying the node and creating a new one. Loopback driver needs to explicitly delete it.
17344-easy-demo @ ee158449ac8cc70708a161cd36845f57b5a248f1
Updated by Lucas Di Pentima about 2 years ago
This LGTM. I have a related comment:
- Package building finished with a message like: {:timestamp=>"2022-09-05T19:51:55.483136+0000", :message=>"Created package", :path=>"/pkg/arvados-server-easy_2.1.0-2866-g10440ac12_amd64.deb"} -- but the real path was "/tmp/*.deb"
I know this is supposed to be used by CI tools but it could be confusing when debugging a package building pipeline.
Updated by Tom Clegg about 2 years ago
Hm, yes. From fpm's perspective inside the container, the output is always /pkg/*.deb. Maybe we just need to print our own log message after that, with the real (host) path.
Updated by Tom Clegg about 2 years ago
17344-easy-demo @ 0840aec1ec6fdcce4d1a317578bc1f5f5be1a1f6
$ go run ./cmd/arvados-package build -package-dir /tmp -package-version $(git describe) -target-os debian:11 ... -rw-r--r-- 1 tom tom 100955048 Sep 8 10:27 /tmp/arvados-server-easy_2.1.0-2869-g0840aec1e_amd64.deb
Updated by Tom Clegg about 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|72ed2e6e260d8e12e49716a261b6306d8de13e8d.