Idea #22580
Updated by Peter Amstutz about 2 months ago
The purpose of @arvbox@ is # to provide a self container developer environment capable of running the entire test suite # to enable launching a self-contained, auto-configured cluster that is can support integration tests (such as running CWL workflows) and manual testing of components that the end user might interact with such as Workbench and keep-web. Arvbox has significant overlap with other functionality -- all of which was written after @arvbox@ was created, but the approaches taken by @arvbox@ were not intended to be general purpose, where as these new methods (mostly based around Ansible) are general purpose, and thus could support a new arvbox. So I'm thinking about how a new iteration of arvbox should work. Current functional overlap: * arvbox Dockerfile uses @arvados-server install@ plus installs some additional packages, but @arvados-server install@ is redundant with the new ansible playbook and will be removed (#22436) * arvbox can launch run-tests, but the "test" environment (set up by run-tests) has entirely separate code from the arvbox scripts that create a "development" environment. having separate binaries depending on how you're running things is a bit confusing. * arvbox has its own code to configure and launch services, which overlaps with code in @run-tests@, @sdk/python/tests/run_test_server.py@, @arvados-server boot@ and the production @systemd@ units h2. Provisioning We've agreed to standardize on Ansible for provisioning and configuration, based on giving Ansible an Arvados configuration file and an inventory and then having Ansible use the inventory to provision nodes based on what we want to use them for. (The previous method of provisioning, @arvados-server install@ is already on its way out). For "arvbox2" it would be great to be able to offload as much as possible to general purpose Ansible playbooks. If so, then arvbox2 could focus on virtual environment management and knowing how to launch "run-tests.sh" or "launch a development arvados cluster" in those environments. h3. Launching services As mentioned earlier, we've got a bunch of different approaches for building and launching services. @run-tests@ has the @install/*@ functions to build each component, and uses @sdk/python/tests/run_test_server.py@ to do some of the configuration and launching. @run-tests@ also contains some logic about which tests require services and which tests don't. Many tests that interact with the test mode API server also have built-in assumptions that the database is populated specifically with the test fixtures defined in @services/api/test/fixtures@ (even tests written in Python or Go). @arvados-server boot@ is used to start up a partial cluster for the purposes of running Cypress integration tests of Workbench 2. I'm not exactly sure of scope of capabilities it has, except that it clearly knows how to bring up API server and controller. In production, we use @systemd@ units to launch services. h2. Virtual environments A big part of what the arvbox shell script (that the user interacts with on the host) is managing the docker container(s), which are brought up with a particular set of command line options to bind-mount various things into the container to make them persistent while being able to tear down the container itself. One of the reasons for doing it this way was to draw clear lines between what is stateful in the container and what isn't, so if the container environment is modified a certain way that involves changing some part of the file system that isn't preserved, that had better be something that is scripted to be re-configured on the next boot. It keeps us honest. It would be great to be able to offload as much as possible to general purpose Ansible playbooks and other configuration code. If so, then arvbox2 could focus on virtual environment management and then only needs to launch general purpose "run-tests.sh" or "launch a development arvados cluster" entry points. h2. Virtual environments This brings up questions about what container or VM technology to use. Ones that we have some experience with include: * Docker (currently used by arvbox) * systemd-nspawn * kvm Other container runners: * podman * Singularity (included for completeness) h3. Docker pros: * The industry standard * We have a ton of operational experience with it * Familiar to lots of other people cons: Running systemd inside Docker is notoriously awkward. Because of this, Arvbox uses "runit" which means none of the service scripts for arvbox are particularly useful for any other environment. If we decided we wanted to use systemd consistently for managing services (whether test/development/production) then we'd need to solve this somehow. There's a systemd stand-in that does minimal service management: https://github.com/gdraheim/docker-systemctl-replacement h3. systemd-nspawn pros: Presumably already packaged everywhere systemd is used, doesn't require adding external repositories (e.g. Docker community edition). Simpler than Docker, you give it a root directory representing your container and some configuration for how to run the container. You get a real init process at PID 1 which runs systemd units as intended. cons: Less well known than Docker Requires additional steps to set up networking to make it easy for the host, container, local network, and Internet to all communicate. h3. Singularity pros: Runs applications in userspace, no root access required. cons: May not provide the features/additional privileges required to run all the Arvados services. h3. kvm pros: Full paravirtualization, runs Linux kernel and a full OS. Greatest isolation. Can run a whole desktop in a window. cons: Takes longer to start and stop than a container. On cloud, we'd be running a virtual machine within a virtual machine; nested virtualization may not be possible in some environments (e.g. a quick search suggests it may be possible on GCP but you can't do it on EC2). Requires additional steps to set up networking to make it easy for the host, VM, local network, and Internet to all communicate. h3. Abstraction layers h4. libvirt and virsh https://ubuntu.com/server/docs/libvirt This is the standard interface for kvm, but also supports LXC which is a container technology for Linux that has been around before Docker. However, we have no operational experience with LXC and how it differs from h4. Vagrant https://github.com/hashicorp/vagrant Specifically intended to help create developer environments using different conainer/virtualization technologies, but now has an icky "Business Source License".