Project

General

Profile

Actions

Feature #22317

open

Compute image builder uses ansible to provision compute node (still invoked by packer)

Added by Peter Amstutz 8 days ago. Updated about 15 hours ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Story points:
-

Subtasks 1 (1 open0 closed)

Task #22334: ReviewNewLucas Di PentimaActions
Actions #1

Updated by Peter Amstutz 8 days ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 8 days ago

  • Target version changed from Development 2024-12-04 to Development 2024-11-20
Actions #4

Updated by Peter Amstutz 3 days ago

  • Assigned To set to Brett Smith
Actions #5

Updated by Brett Smith 1 day ago

22317-compute-node-ansible @ 2638fb1852ec48f5b65364768ab9978dda818efa

This is far enough along that people should look at it and we should make sure we like it before I continue with updating documentation and other interfaces, even though it is not ready to be merged as-is.

I have used this branch to build and successfully test a new compute image for tordo. ami-077c29f012ca7df74, tordo-xvhdp-s60utv85ngx4as9. Note this image uses Docker, not Singularity. I also manually booted an instance of this AMI and SSH'ed into it to dig into the state of specific systemd services, individual configuration files, etc. Everything looks as expected.

Assuming you already have Packer installed locally, you can test this yourself by installing Ansible and the Packer provisioner for it. You can adjust the Ansible install path to taste:

python3 -m venv ~/ansible
~/ansible/bin/pip install 'ansible~=8.7'
packer plugins install github.com/hashicorp/ansible

Once this is done, you can test by activating the virtualenv with . ~/ansible/bin/activate, then run build.sh as you normally would.

The choice of using Ansible 8 was semi-arbitrary. Basically every release of Ansible supports three versions of Python, and this was the latest Ansible version that covered the older Python versions we care about. Our final version requirement may need some additional flexibility. But for now I wanted to target a specific version for simplicity of testing.

The way this works is that when Packer gets to the provision step, it runs ansible-playbook in a way that's configured to know about a single host default that is the cloud instance it created. Ansible runs through the tasks defined in build-compute-image.yml in order. When a role is included, it starts running the tasks defined in roles/NAME/tasks/main.yml. Tasks in a role can refer to files and templates by relative path, which are found under the role's files and templates subdirectories respectively.

Ansible uses Jinja templates demarcated with {{ }} for variable data. Default values for variables are set in roles/NAME/defaults/main.yml. Values that the user provides override it. An override file is generated for you by build.sh and passed on to Ansible.

The Git commit message has some details about some things I did a little differently than base.sh and why.

Personally, I would like to get away from build.sh. The interface is not especially friendly, and it's a lot of code to provide relatively marginal value IMO. I would rather this whole build step work more like Arvados itself: we give you a YAML file of Ansible settings you can edit, plus one JSON file per cloud of Packer settings you can edit, and then you just invoke Packer with your edited JSON. I am curious what you think about that. If it sounds good to you, I can make it part of the branch, and update documentation accordingly.

Actions #6

Updated by Lucas Di Pentima 1 day ago

My comments:

  • Code looks a lot cleaner and organized than the previous thing.
  • I agree that getting rid of build.sh will simplify things even further and we won't be losing a lot in the process.
  • I realized that cloud-init is not requested to be installed anymore, is that on purpose?
  • Are you planning on supporting Ubuntu as well? AFAIK the previous packer script was usable with Ubuntu-based AMIs
  • Would it be convenient/possible to declare & check the expected Ansible (or its plugin) version from packer?
Actions #7

Updated by Brett Smith 1 day ago

Okay, I'll keep working on it from here, but for background:

Lucas Di Pentima wrote in #note-6:

  • I realized that cloud-init is not requested to be installed anymore, is that on purpose?

Yes. By my reading, the main reason we installed that was to hook in the encrypted partition script as a boot script. Now that I've written a systemd service definition for that, we don't have any specific need for cloud-init ourselves.

To be clear, I expect most if not all of the images that we build on top of to use it. But if that's the case it should already be installed, we shouldn't need to repeat that. But if you know something I missed, please let me know.

  • Are you planning on supporting Ubuntu as well? AFAIK the previous packer script was usable with Ubuntu-based AMIs

I was not able to get it to work in my recent testing. See my comments on #22217#note-9. That said, the only thing I think we need is an apt repository definition, which is no big deal, I can add that.

  • Would it be convenient/possible to declare & check the expected Ansible (or its plugin) version from packer?

The plugin documentation describes how you can write a wrapper script that activates a virtualenv and potentially does other work before starting ansible-playbook. That would probably be the place to implement checks like that.

I am open to doing stuff like this, but personally I would like to hold off on it a little bit. When I wrote this branch, I tried to keep in mind the long-term goal of writing a whole installer based on Ansible. For example, the arvados_apt role should be usable on all nodes, roles like compute_docker could be reused for the shell node, etc.

I would like to be a little further along in that process, and have some idea of how we expect users to configure and run the whole Ansible installer, before we start codifying decisions about how and where Ansible gets installed and configured. I would rather not codify decisions like that now for the compute node builder, only to throw them all out again when we start working on the installer more generally. Because this compute node builder is a very ops-focused thing where the user is expected to be a little more expert, I think it's okay to leave it a little sharp around the edges for the time being.

Actions #8

Updated by Peter Amstutz about 18 hours ago

  • Target version changed from Development 2024-11-20 to Development 2024-12-04
Actions

Also available in: Atom PDF