Idea #22580
opennew method for launching a test or development environment which can run tests and bring up an auto-configured, usable cluster in "development" mode
Description
The purpose of arvbox
is
- to provide a self container developer environment capable of running the entire test suite
- to enable launching a self-contained, auto-configured cluster that is can support integration tests (such as running CWL workflows) and manual testing of components that the end user might interact with such as Workbench and keep-web.
Arvbox has significant overlap with other functionality -- all of which was written after arvbox
was created, but the approaches taken by arvbox
were not intended to be general purpose, where as these new methods (mostly based around Ansible) are general purpose, and thus could support a new arvbox.
So I'm thinking about how a new iteration of arvbox should work.
Current functional overlap:
- arvbox Dockerfile uses
arvados-server install
plus installs some additional packages, butarvados-server install
is redundant with the new ansible playbook and will be removed (#22436) - arvbox can launch run-tests, but the "test" environment (set up by run-tests) has entirely separate code from the arvbox scripts that create a "development" environment. having separate binaries depending on how you're running things is a bit confusing.
- arvbox has its own code to configure and launch services, which overlaps with code in
run-tests
,sdk/python/tests/run_test_server.py
,arvados-server boot
and the productionsystemd
units
Provisioning¶
We've agreed to standardize on Ansible for provisioning and configuration, based on giving Ansible an Arvados configuration file and an inventory and then having Ansible use the inventory to provision nodes based on what we want to use them for.
(The previous method of provisioning, arvados-server install
is already on its way out).
For "arvbox2" it would be great to be able to offload as much as possible to general purpose Ansible playbooks. If so, then arvbox2 could focus on virtual environment management and knowing how to launch "run-tests.sh" or "launch a development arvados cluster" in those environments.
Launching services¶
As mentioned earlier, we've got a bunch of different approaches for building and launching services.
run-tests
has the install/*
functions to build each component, and uses sdk/python/tests/run_test_server.py
to do some of the configuration and launching.run-tests
also contains some logic about which tests require services and which tests don't. Many tests that interact with the test mode API server also have built-in assumptions that the database is populated specifically with the test fixtures defined in services/api/test/fixtures
(even tests written in Python or Go).
arvados-server boot
is used to start up a partial cluster for the purposes of running Cypress integration tests of Workbench 2. I'm not exactly sure of scope of capabilities it has, except that it clearly knows how to bring up API server and controller.
In production, we use systemd
units to launch services.
Virtual environments¶
A big part of what the arvbox shell script (that the user interacts with on the host) is managing the docker container(s), which are brought up with a particular set of command line options to bind-mount various things into the container to make them persistent while being able to tear down the container itself.
One of the reasons for doing it this way was to draw clear lines between what is stateful in the container and what isn't, so if the container environment is modified a certain way that involves changing some part of the file system that isn't preserved, that had better be something that is scripted to be re-configured on the next boot. It keeps us honest.
This brings up questions about what container or VM technology to use. Ones that we have some experience with include:
- Docker (currently used by arvbox)
- systemd-nspawn
- kvm
Other container runners:
- podman
- Singularity (included for completeness)
Docker¶
pros:
- The industry standard
- We have a ton of operational experience with it
- Familiar to lots of other people
cons:
Running systemd inside Docker is notoriously awkward. Because of this, Arvbox uses "runit" which means none of the service scripts for arvbox are particularly useful for any other environment.
If we decided we wanted to use systemd consistently for managing services (whether test/development/production) then we'd need to solve this somehow.
There's a systemd stand-in that does minimal service management:
https://github.com/gdraheim/docker-systemctl-replacement
systemd-nspawn¶
pros:
Presumably already packaged everywhere systemd is used, doesn't require adding external repositories (e.g. Docker community edition).
Simpler than Docker, you give it a root directory representing your container and some configuration for how to run the container.
You get a real init process at PID 1 which runs systemd units as intended.
cons:
Less well known than Docker
Requires additional steps to set up networking to make it easy for the host, container, local network, and Internet to all communicate.
Singularity¶
pros:
Runs applications in userspace, no root access required.
cons:
May not provide the features/additional privileges required to run all the Arvados services.
kvm¶
pros:
Full paravirtualization, runs Linux kernel and a full OS.
Greatest isolation.
Can run a whole desktop in a window.
cons:
Takes longer to start and stop than a container.
On cloud, we'd be running a virtual machine within a virtual machine; nested virtualization may not be possible in some environments (e.g. a quick search suggests it may be possible on GCP but you can't do it on EC2).
Requires additional steps to set up networking to make it easy for the host, VM, local network, and Internet to all communicate.
Abstraction layers¶
libvirt and virsh¶
https://ubuntu.com/server/docs/libvirt
This is the standard interface for kvm, but also supports LXC which is a container technology for Linux that has been around before Docker. However, we have no operational experience with LXC and how it differs from
Vagrant¶
https://github.com/hashicorp/vagrant
Specifically intended to help create developer environments using different conainer/virtualization technologies, but now has an icky "Business Source License".
Updated by Peter Amstutz about 2 months ago
- Position changed from -939566 to -939559
Updated by Peter Amstutz about 2 months ago
- Description updated (diff)
- Subject changed from feature in run-tests that brings up a usable cluster & lets you rebuild/restart individual services similar to arvbox to new method for bringing up an auto-configured, usable cluster in "development" mode & lets you rebuild/restart individual services
Updated by Peter Amstutz about 2 months ago
- Subject changed from new method for bringing up an auto-configured, usable cluster in "development" mode & lets you rebuild/restart individual services to new method for launching a test or development environment which can run tests and bring up an auto-configured, usable cluster in "development" mode