Project

General

Profile

Actions

Idea #22580

open

new method for launching a test or development environment which can run tests and bring up an auto-configured, usable cluster in "development" mode

Added by Peter Amstutz about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Tests
Target version:
Start date:
Due date:
Story points:
-

Description

The purpose of arvbox is

  1. to provide a self container developer environment capable of running the entire test suite
  2. to enable launching a self-contained, auto-configured cluster that is can support integration tests (such as running CWL workflows) and manual testing of components that the end user might interact with such as Workbench and keep-web.

Arvbox has significant overlap with other functionality -- all of which was written after arvbox was created, but the approaches taken by arvbox were not intended to be general purpose, where as these new methods (mostly based around Ansible) are general purpose, and thus could support a new arvbox.

So I'm thinking about how a new iteration of arvbox should work.

Current functional overlap:

  • arvbox Dockerfile uses arvados-server install plus installs some additional packages, but arvados-server install is redundant with the new ansible playbook and will be removed (#22436)
  • arvbox can launch run-tests, but the "test" environment (set up by run-tests) has entirely separate code from the arvbox scripts that create a "development" environment. having separate binaries depending on how you're running things is a bit confusing.
  • arvbox has its own code to configure and launch services, which overlaps with code in run-tests, sdk/python/tests/run_test_server.py, arvados-server boot and the production systemd units

Provisioning

We've agreed to standardize on Ansible for provisioning and configuration, based on giving Ansible an Arvados configuration file and an inventory and then having Ansible use the inventory to provision nodes based on what we want to use them for.

(The previous method of provisioning, arvados-server install is already on its way out).

For "arvbox2" it would be great to be able to offload as much as possible to general purpose Ansible playbooks. If so, then arvbox2 could focus on virtual environment management and knowing how to launch "run-tests.sh" or "launch a development arvados cluster" in those environments.

Launching services

As mentioned earlier, we've got a bunch of different approaches for building and launching services.

run-tests has the install/* functions to build each component, and uses sdk/python/tests/run_test_server.py to do some of the configuration and launching.
run-tests also contains some logic about which tests require services and which tests don't. Many tests that interact with the test mode API server also have built-in assumptions that the database is populated specifically with the test fixtures defined in services/api/test/fixtures (even tests written in Python or Go).

arvados-server boot is used to start up a partial cluster for the purposes of running Cypress integration tests of Workbench 2. I'm not exactly sure of scope of capabilities it has, except that it clearly knows how to bring up API server and controller.

In production, we use systemd units to launch services.

Virtual environments

A big part of what the arvbox shell script (that the user interacts with on the host) is managing the docker container(s), which are brought up with a particular set of command line options to bind-mount various things into the container to make them persistent while being able to tear down the container itself.

One of the reasons for doing it this way was to draw clear lines between what is stateful in the container and what isn't, so if the container environment is modified a certain way that involves changing some part of the file system that isn't preserved, that had better be something that is scripted to be re-configured on the next boot. It keeps us honest.

This brings up questions about what container or VM technology to use. Ones that we have some experience with include:

  • Docker (currently used by arvbox)
  • systemd-nspawn
  • kvm

Other container runners:

  • podman
  • Singularity (included for completeness)

Docker

pros:

  • The industry standard
  • We have a ton of operational experience with it
  • Familiar to lots of other people

cons:

Running systemd inside Docker is notoriously awkward. Because of this, Arvbox uses "runit" which means none of the service scripts for arvbox are particularly useful for any other environment.

If we decided we wanted to use systemd consistently for managing services (whether test/development/production) then we'd need to solve this somehow.

There's a systemd stand-in that does minimal service management:

https://github.com/gdraheim/docker-systemctl-replacement

systemd-nspawn

pros:

Presumably already packaged everywhere systemd is used, doesn't require adding external repositories (e.g. Docker community edition).

Simpler than Docker, you give it a root directory representing your container and some configuration for how to run the container.

You get a real init process at PID 1 which runs systemd units as intended.

cons:

Less well known than Docker

Requires additional steps to set up networking to make it easy for the host, container, local network, and Internet to all communicate.

Singularity

pros:

Runs applications in userspace, no root access required.

cons:

May not provide the features/additional privileges required to run all the Arvados services.

kvm

pros:

Full paravirtualization, runs Linux kernel and a full OS.

Greatest isolation.

Can run a whole desktop in a window.

cons:

Takes longer to start and stop than a container.

On cloud, we'd be running a virtual machine within a virtual machine; nested virtualization may not be possible in some environments (e.g. a quick search suggests it may be possible on GCP but you can't do it on EC2).

Requires additional steps to set up networking to make it easy for the host, VM, local network, and Internet to all communicate.

Abstraction layers

libvirt and virsh

https://ubuntu.com/server/docs/libvirt

This is the standard interface for kvm, but also supports LXC which is a container technology for Linux that has been around before Docker. However, we have no operational experience with LXC and how it differs from

Vagrant

https://github.com/hashicorp/vagrant

Specifically intended to help create developer environments using different conainer/virtualization technologies, but now has an icky "Business Source License".

Actions #1

Updated by Peter Amstutz about 2 months ago

  • Position changed from -939566 to -939559
Actions #2

Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
  • Subject changed from feature in run-tests that brings up a usable cluster & lets you rebuild/restart individual services similar to arvbox to new method for bringing up an auto-configured, usable cluster in "development" mode & lets you rebuild/restart individual services
Actions #3

Updated by Peter Amstutz about 2 months ago

  • Subject changed from new method for bringing up an auto-configured, usable cluster in "development" mode & lets you rebuild/restart individual services to new method for launching a test or development environment which can run tests and bring up an auto-configured, usable cluster in "development" mode
Actions #4

Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz about 2 months ago

  • Description updated (diff)
Actions

Also available in: Atom PDF