Idea #21880
openNew installer
Description
The current stack of installer.sh + provision.sh + salt + stacks/pillars/formulas increasingly feels over complicated and brittle, in part due to concerns the the salt ecosystem doesn't really seem to be keeping up.
To start the discussion, here's a sketch:
- Run Terraform and capture the output state such that it can be used directly as input to Ansible
- Ideally the description of resources is somewhat independent of infrastructure and could be written by hand for fixed on-premises resources as well
- We want to tag resources with roles at this point
- Perhaps other cluster configuration could be declared as constants that pass from input to output, so we get one output state that has everything required to deploy the cluster?
- Run Ansible on the output state produced by Terraform
- Goes through each role and configures the machines that have that role by writing config files and installing packages
- When installation is done, also runs diagnostics automatically
FWIW, it seems integration can go various ways
- Terraform runs Ansible - https://github.com/ansible/terraform-provider-ansible
- Ansible runs Terraform
- Manually run Terraform and then Ansible
- Orchestrate them from a 3rd tool or script
(my feeling about the last two is "ugh" but included for completeness)
Also, since Terraform is under a "Business Software License" we should plan on migrating to OpenTofu https://opentofu.org/
Updated by Lucas Di Pentima 4 months ago
As someone with zero experience in Ansible, I would like to take some time to learn the basics so that I can research on what the best practices are for our case.
Updated by Brett Smith 4 months ago
Rough notes:
Idea would be to ditch both Salt and Terraform completely, and just use Ansible. Ansible does not have the same level of "manage your whole cloud" features that Terraform does, but it has enough to create and cross-configure cloud resources, which should be enough for our purposes for the foreseeable future. You can do things like run a playbook that creates an RDS instance, creates an EC2 instance, then SSHes into the EC2 instance to configure software to use the RDS instance, without separately recording the RDS configuration anywhere. Using a single system would have the benefit of avoiding some of the problems our current installer has of needing to keep configuration in sync across Terraform, Salt, and related tools like the compute node image builder.
Ansible playbooks run linearly, and describe the state you want. Each play checks whether the desired state exists, and makes it so if not, by either creating or modifying resources as needed. With a little care, it should be possible to write a single playbook that can both create an Arvados cluster, and then safely re-run on that same cluster to make configuration changes.
Handy links:
Ansible modules - The basic building blocks of playbooks. Conveniently AWS is first thing on the list so you can get a sense of its cloud capabilities. After that, the "builtin" module has a lot of the fundamental Unix admin capabilities.
Ansible Galaxy - The source for third-party stuff, most akin to our Saltstack Formulas we use now.
Inventory documentation - The inventory is the list of all the hosts you might want to run playbooks on, and associated configuration. This walks through your basic options for setting that up.
Facts and variables - Ansible automatically discovers lots of things about managed hosts and makes that available to the playbook and templates in variables.