Project

General

Profile

Actions

Support #22318

closed

Prototype using ansible to set up local test environments

Added by Peter Amstutz 3 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Due date:
Story points:
-
Release relationship:
Auto

Description

For testing the installer, we would like to be able to spin up 1 or more VMs on the local node into which we will install Arvados.

Research using KVM (or virtualbox or xen... research virtualization technologies) and the ansible support for managing VMs (using this module or other appropriate module)

https://docs.ansible.com/ansible/latest/collections/community/libvirt/virt_module.html

The goal is to have an ansible script that spins up VMs that can talk to one another which can be handed off to the installer playbook.

Ultimately we want something that can run on both developer desktops and on Jenkins nodes.

There's two types of virtualization

type 1, runs "below" the host kernel

type 2, runs "above" the host kernel

Xen is a type 1 hypervisor that is dedicated to running one or more guest OSes. Apparently the guest OSes also require some special settings to work with Xen.

KVM is a type 1 hypervisor that is part of the linux kernel. Guest OSes don't require modification. (I've seen this referred to as both a type 1 and type 2 hypervisor as well as a hybrid)

Virtualbox is a type 2 hypervisor. It is has a GUI and is good for running software where you need to use the desktop within the VM. It is mostly GPL but reportedly has some binary blobs that make it non-free.

There's a bunch of other software for managing VMs but it seems that they are either proprietary (e.g. vmware) or work in collaboration with Xen or KVM underneath.

https://cloud.google.com/compute/docs/instances/nested-virtualization/overview

Google says KVM is supported for type 1 virtualization or any type 2 virtualization (I guess?).

KVM needs to be enabled in the Linux kernel. I don't know if that's a feature enabled in default Debian/Ubuntu kernel, something that can be loaded as a module, or possibly requires custom kernel (I hope not!)


Subtasks 1 (0 open1 closed)

Task #22335: Review 22318-ansible-test-nodeResolvedBrett Smith01/06/2025Actions

Related issues 3 (2 open1 closed)

Related to Arvados - Idea #22289: Engineering meeting to discuss packaging vs container deployment going forwardResolvedPeter AmstutzActions
Related to Arvados - Idea #22290: Engineering meeting to discuss single node install planNewPeter AmstutzActions
Blocks Arvados - Support #22238: Prototype ansible installerNewBrett SmithActions
Actions #1

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz 3 months ago

  • Description updated (diff)
Actions #6

Updated by Brett Smith 3 months ago

Peter Amstutz wrote:

For testing the installer, we would like to be able to spin up 1 or more VMs on the local node into which we will install Arvados.

Research using KVM (or virtualbox or xen... research virtualization technologies) and the ansible support for managing VMs (using this module or other appropriate module)

https://docs.ansible.com/ansible/latest/collections/community/libvirt/virt_module.html

The goal is to have an ansible script that spins up VMs that can talk to one another which can be handed off to the installer playbook.

After some reading and some playing I think we should just go with systemd-nspawn.

Since KVM is meant to virtualize an entire machine setup is relatively heavyweight. It wants you to specify how many CPU cores, how much RAM, device layout, etc. Setting up the machine is often a matter of actually running your OS's installer. I don't believe we ever need this level of isolation from the host system, and it would create extra hoops we need to jump through for no benefit.

systemd-nspawn avoids all that overhead while providing the isolation we need. Since processes live alongside their host system, they get allocated resources the same way, so they can just use whatever's available (although we have the option to set limits in the future if we want). systemd-nspawn can import Docker images, or installation can be as simple as a single debootstrap command.

I've tested that you can SSH into a systemd-nspawn machine (which is enough for Ansible to work) and run Docker containers in it.

systemd-nspawn supports all the network topography options we might want with bridges, zones, etc. This will probably be the fussiest part of the setup, requiring the most host integration (I missed standup because I was deep in the weeds of network debugging, oops) but I'm guessing this will be true whether our solution is systemd-nspawn or KVM or Docker or whatever else.

In short, I think systemd-nspawn provides level of isolation we need, and no more; and that means it will be the easiest virtualization technology to build on top of.

Actions #7

Updated by Peter Amstutz 3 months ago

  • Related to Idea #22289: Engineering meeting to discuss packaging vs container deployment going forward added
Actions #8

Updated by Peter Amstutz 3 months ago

  • Related to Idea #22290: Engineering meeting to discuss single node install plan added
Actions #9

Updated by Peter Amstutz 3 months ago

  • Assigned To set to Brett Smith
Actions #10

Updated by Peter Amstutz 3 months ago

  • Subject changed from Prototype using ansible to set up local VMs to Prototype using ansible to set up local test environments
Actions #11

Updated by Peter Amstutz 3 months ago

Actions #12

Updated by Peter Amstutz 2 months ago

  • Target version changed from Development 2024-12-04 to Development 2025-01-08
Actions #13

Updated by Brett Smith about 2 months ago

22318-ansible-test-node @ cd788d711385f83e4ec5ceb26f6502b1df20b1ef

Based on this ticket's title and the fact that I also own #22819, I prototyped this playbook with the goal of setting up a full test environment.

I have been testing inside a systemd-nspawn virtual machine. This is admittedly somewhat tangential to the original ticket, but I figure if I can get the tests passing inside a virtual machine, that's a pretty good sign that we'll be able to install a whole functional cluster inside one, since we have some tests that basically do exactly that. And the more detailed diagnostics of tests make it easier to track down issues with the virtual machine configuration than a failed installer run.

This has also been useful to demonstrate how continued Ansible development could go:

  • It shows how we can write Ansible roles that can be usefully shared across multiple deployment tasks. With a little reorganization this playbook reuses several roles that were originally written for the compute node builder.
  • It follows the same pattern of "write an Arvados config.yml, Ansible reads configuration from that." This is working well enough I expect this will be the main way to configure all our deployment playbooks.

Admittedly it makes no sense to have this playbook under the compute-images directory but I'm punting that problem while it's a prototype. We can figure out new organization once we're ready to start using this.

Actions #14

Updated by Brett Smith about 2 months ago

I have gotten a systemd-nspawn VM running Debian 12 that passes most tests. At this point I believe the failing tests are caused by unexpected version or configuration differences and not limitations of the VM. In other words, they're bugs that would be fixed in Arvados or in the Ansible playbook I just pushed.

Let's assume you're naming the VM arvdev. Here's what you need:

One-time supervisor host setup

Networking

sudo apt install systemd-container
sudo systemctl enable --now systemd-networkd

Note systemd-networkd only manages configured interfaces. On Debian the default configuration should play nice with NetworkManager.

systemd-networkd runs a DHCP server that provides private addresses to the virtual machines. You will need to configure your firewall to allow these DHCP requests, and to NAT traffic from those interfaces. These steps are specific to the host firewall so I can't write anything more specific here.

The configuration below will create a virtual Ethernet interface named ve-arvdev. See the network options of systemd-nspawn for more information about how interfaces get named.

btrfs

systemd-nspawn stores both images and containers under /var/lib/machines. It works with any filesystem, but if the filesystem is btrfs, it can optimize various operations with snapshots, etc. Here's a blog post outlining some of the gains. I would recommend any production deployment plan on having a dedicated btrfs filesystem mounted at /var/lib/machines.

Build the systemd-nspawn container image

Install Debian/Ubuntu from scratch

sudo debootstrap --include=systemd,openssh-server,sudo bullseye /var/lib/machines/arvdev
sudo systemd-nspawn -D /var/lib/machines/arvdev -a bash
arvdev# systemctl enable systemd-networkd ssh
[set up a user with SSH authorized_keys and sudo for yourself]
arvdev# exit

This is enough to have a fully bootable image that you can SSH into. This would be enough image to be useful for, e.g., testing the installer. You can also start this container and run Ansible on it to continue building out the image. e.g., install-test-env.yml would install tools and configuration necessary for the image to run tests.

Run Ansible

  1. sudo systemctl start systemd-nspawn@arvdev
  2. machinectl list should report arvdev's private IP address
  3. Write an Ansible inventory.ini with one line for it: arvdev ansible_host=192.168.[…]
  4. From arvados/tools/compute-images/ansible:
    ansible-playbook -K -i YOUR_INVENTORY.ini -e arvados_config_file=YOUR_ARVADOS_CONFIG.yml install-test-env.yml

Save the image

This step isn't necessary for testing, but note that now /var/lib/machines/arvdev is a full filesystem that you can save as an image to boot future containers from. There are several ways to save images, as simple as copying the directory tree, or tar, and getting more complex; refer to the image commands of machinectl for more information.

Set up container privileges necessary to run tests

FUSE, Docker, and Singularity all require additional privileges that VMs normally don't get. To run these components, you need to grant some privileges both in the systemd-nspawn service and in the specific container configuration.

/devices.conf

You could write an override for all of systemd-nspawn@.service.d without naming a specific container, but then you're elevating privileges for all your VMs. That might be okay for an environment like Jenkins. I wouldn't recommend it for a more experimental environment like your own development machine.

[Service]
# Required for FUSE and Crunch
DeviceAllow=/dev/fuse
# Required for Singularity
DeviceAllow=block-loop rwm

[Unit]
# Make sure loop devices are resolvable so DeviceAllow=block-loop works.
Wants=modprobe@loop.service
After=modprobe@loop.service

/etc/systemd/nspawn/arvdev.nspawn

[Exec]
# Required for Docker
SystemCallFilter=add_key bpf keyctl

# Required for Singularity to run setuid (is this really required?)
PrivateUsers=0

[Files]
# Required for FUSE and Crunch - must be allowed in service too
Bind=/dev/fuse

# Required for Singularity - block-loop creation must be allowed in service too
Bind=/dev/loop-control

[Network]
# This assumes you set up systemd-networkd to serve DHCP as documented above,
# including setting up the necessary firewall rules.
VirtualEthernet=yes

Run the tests

sudo systemctl daemon-reload
sudo systemctl restart systemd-nspawn@arvdev

Now from here, in principle, you can SSH into the VM, clone the Arvados Git repository, and run build/run-tests.sh. You might hit a few small road bumps along the way, and the tests might not pass 100%, because again, this is a prototype and not fully tested. But I have gotten the vast majority of tests to pass, including integration tests with FUSE, Docker, and Singularity, which demonstrates that the VM is capable of running these in a full install.

IMPORTANT DETAIL: I have been running the tests with --skip sanity, because I suspect some of the checks are out-of-date, and I've been using this as an opportunity to figure out which ones those are. I can make a branch that cleans up those checks based on what the Ansible playbook actually installs.

Actions #15

Updated by Brett Smith about 2 months ago

At standup we talked about building and reusing images and optimizing with btrfs. I have edited the previous comment with more details to help spell all that out.

Agreed the next steps from here are two more playbooks: one to build a base image, and one to configure and prepare a container on the host node. Both are small.

Actions #16

Updated by Brett Smith about 2 months ago

Trying things out on Debian 11, some minor details:

  • The default arvados_postgresql_hba_method: scram-sha-256 is good for recent distros but not Debian 11. I can make this conditional but in the meantime you can just run ansible-playbook with -e arvados_postgresql_hba_method=md5.
  • systemd-nspawn gives you a relatively small /tmp by default, which I think is good but it means you'll need to be sure to pass a different --temp directory to run-tests.sh.
Actions #17

Updated by Brett Smith about 1 month ago

Brett Smith wrote in #note-15:

Agreed the next steps from here are two more playbooks: one to build a base image, and one to configure and prepare a container on the host node. Both are small.

Now at b4ddf4dbbea42987090c066de6846a0c8d17b57a with first versions of both of those. build-debian-nspawn-vm.yml automates the "Install Debian/Ubuntu from scratch" process. privilege-nspawn-vm.yml automates the "Set up container privileges necessary to run tests" process.

Actions #18

Updated by Lucas Di Pentima about 1 month ago

I followed the instructions and worked with a Debian12 VM and a Debian11 nspawn container. Using ansible version 8.7.0.

Here're my comments:

  • I needed to install debootstrap, net-tools (both in the host VM and inside the nspawn container) and python3 (inside nspawn container), to take into consideration for the documentation.
  • Is there a way of setting a default value on become to true? I'm not seeing much value in needing to set become: true almost everywhere.
    • This was already answered in today's call: It's nice to have it explicit given that we're building a library of ansible modules that can be reused in different places.
  • When trying to run the install-test-env.yml playbook, I get an error from the postgresql module:
$ ansible-playbook -K -i ~/inventory.ini -e arvados_config_file=~/arvados_config.yml -e arvados_postgresql_hba_method=md5 install-test-env.yml

...

TASK [ansible.builtin.include_role : arvados_postgresql] *******************************************************

TASK [arvados_postgresql : Install PostgreSQL server package] **************************************************
changed: [arvdev]

TASK [arvados_postgresql : Find pg_hba.conf file] **************************************************************
changed: [arvdev]

TASK [arvados_postgresql : Create pg_hba.conf entries] *********************************************************
failed: [arvdev] (item=127.0.0.1/24) => {"ansible_loop_var": "item", "changed": false, "item": "127.0.0.1/24", "module_stderr": "Shared connection to 192.168.165.37 closed.
", "module_stdout": "sudo: unable to resolve host debian-gnu-linux-12: Name or service not known\r

Traceback (most recent call last):
  File \"/home/lucas/.ansible/tmp/ansible-tmp-1735856100.642767-41958-261623601392163/AnsiballZ_postgresql_pg_hba.py\", line 107, in <module>
    _ansiballz_main()
  File \"/home/lucas/.ansible/tmp/ansible-tmp-1735856100.642767-41958-261623601392163/AnsiballZ_postgresql_pg_hba.py\", line 99, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File \"/home/lucas/.ansible/tmp/ansible-tmp-1735856100.642767-41958-261623601392163/AnsiballZ_postgresql_pg_hba.py\", line 47, in invoke_module
    runpy.run_module(mod_name='ansible_collections.community.postgresql.plugins.modules.postgresql_pg_hba', init_globals=dict(_module_fqn='ansible_collections.community.postgresql.plugins.modules.postgresql_pg_hba', _modlib_path=modlib_path),
  File \"/usr/lib/python3.9/runpy.py\", line 210, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File \"/usr/lib/python3.9/runpy.py\", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File \"/usr/lib/python3.9/runpy.py\", line 87, in _run_code
    exec(code, run_globals)
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_z3zr4rja/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 907, in <module>
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_z3zr4rja/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 879, in main
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_z3zr4rja/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 428, in add_rule
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_z3zr4rja/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 617, in key
KeyError: 'db'
", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
failed: [arvdev] (item=::1/128) => {"ansible_loop_var": "item", "changed": false, "item": "::1/128", "module_stderr": "Shared connection to 192.168.165.37 closed.
", "module_stdout": "sudo: unable to resolve host debian-gnu-linux-12: Name or service not known\r

Traceback (most recent call last):
  File \"/home/lucas/.ansible/tmp/ansible-tmp-1735856101.072742-41958-171062904954997/AnsiballZ_postgresql_pg_hba.py\", line 107, in <module>
    _ansiballz_main()
  File \"/home/lucas/.ansible/tmp/ansible-tmp-1735856101.072742-41958-171062904954997/AnsiballZ_postgresql_pg_hba.py\", line 99, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File \"/home/lucas/.ansible/tmp/ansible-tmp-1735856101.072742-41958-171062904954997/AnsiballZ_postgresql_pg_hba.py\", line 47, in invoke_module
    runpy.run_module(mod_name='ansible_collections.community.postgresql.plugins.modules.postgresql_pg_hba', init_globals=dict(_module_fqn='ansible_collections.community.postgresql.plugins.modules.postgresql_pg_hba', _modlib_path=modlib_path),
  File \"/usr/lib/python3.9/runpy.py\", line 210, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File \"/usr/lib/python3.9/runpy.py\", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File \"/usr/lib/python3.9/runpy.py\", line 87, in _run_code
    exec(code, run_globals)
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_ay8hntee/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 907, in <module>
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_ay8hntee/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 879, in main
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_ay8hntee/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 428, in add_rule
  File \"/tmp/ansible_community.postgresql.postgresql_pg_hba_payload_ay8hntee/ansible_community.postgresql.postgresql_pg_hba_payload.zip/ansible_collections/community/postgresql/plugins/modules/postgresql_pg_hba.py\", line 617, in key
KeyError: 'db'
", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

PLAY RECAP *****************************************************************************************************
arvdev                     : ok=25   changed=20   unreachable=0    failed=1    skipped=2    rescued=0    ignored=0

After some time I realized I didn't set the arvados_cluster.PostgreSQL.Connection.dbname config so I guess Ansible wanted to work with an empty string value . Maybe we can fail on that with a clear message?

As mentioned on the call, I'm also getting hard to understand errors when running the build-debian-nspawn-vm.yml playbook and passing the ssh pubkey content. I guess that if we need to format that in a special way so that Ansible understands it, maybe it would be better to pass the pubkey file path instead of its contents and handle formatting inside the playbook to avoid headaches on users that are not familiar with Ansible quirks.

I think tools/compute-images/ansible/privilege-nspawn-vm.yml is missing become: true/yes on several (if not all tasks). Running it with -b as you pointed out in the call makes it succeed.

In tools/compute-images/ansible/build-debian-nspawn-vm.yml, if no image_passhash is passed, wdyt about giving the admin user password-less sudo access? I think it could be nice for automation purposes. If security is a concern, we could instead make the image_passhash parameter mandatory.

Actions #19

Updated by Brett Smith about 1 month ago

  • Status changed from New to In Progress

Lucas Di Pentima wrote in #note-18:

  • I needed to install debootstrap, net-tools (both in the host VM and inside the nspawn container) and python3 (inside nspawn container), to take into consideration for the documentation.

For the host stuff: sure. Right now I don't think documentation is in scope because this is "just" a "prototype" ticket. Before I go documenting how to use it more formally, it would help to better scope the audience. Are we literally just deploying this on Curii's Jenkins server? Other internal servers? Is it supposed to be usable by outside contributors for… something? Having those decisions made would help decide what the documentation needs to cover.

For stuff inside the container, I don't understand why this was necessary because it should already be covered:

  • build-debian-nspawn-vm.yml runs debootstrap with --include=python3 which should preinstall it.
  • install-test-env.yml installs net-tools pretty early.

Can you say more about what the problem you encountered was? In general, I think it's great that you're willing to paper over minor issues to keep trucking on the branch review, but when getting feedback like this, it would really help me to know what the original problem was so I can know what all the options for dealing with it are. Right now I'm at a loss for why the code that's already there didn't accomplish what you say you did.

  • Is there a way of setting a default value on become to true? I'm not seeing much value in needing to set become: true almost everywhere.
    • This was already answered in today's call: It's nice to have it explicit given that we're building a library of ansible modules that can be reused in different places.

Also note that you can run ansible-playbook with -b to have every task become by default.

I'll say in general, Ansible gives you a lot of flexibility in deciding how to go about running a particular playbook. Especially for something that's a prototype, right now I'm erring on the side of specifying relatively little, and giving the administrator maximum control through Ansible itself, to avoid committing us to a particular workflow too early. I understand since this is new to the rest of the team, nobody else knows what all those options are (and to be clear, I don't even know all the options either, and maybe not even most of them, but some of them). But I hope we can keep talking about the different options that are available in these early stages.

After some time I realized I didn't set the arvados_cluster.PostgreSQL.Connection.dbname config so I guess Ansible wanted to work with an empty string value . Maybe we can fail on that with a clear message?

In general, I'm not opposed to this. However, I would like to think a little about how we're going to scale it. As we work towards building a full cluster installer, we might potentially want to check for the definition of dozens of config.yml settings (see all the settings that the Salt installer currently takes). Off the top of my head, I don't know a way to do that that isn't going to become difficult to maintain over time.

As mentioned on the call, I'm also getting hard to understand errors when running the build-debian-nspawn-vm.yml playbook and passing the ssh pubkey content.

As discussed, the issue is that ansible-playbook -e tries to parse a multi-word argument as multiple assignments. You can get around this by YAML-quoting values inside the passed string; e.g., say:

ansible-playbook -e 'image_name=bookworm image_authorized_keys="ssh-rsa AAA…"'

There are other options. See the documentation.

I guess that if we need to format that in a special way so that Ansible understands it, maybe it would be better to pass the pubkey file path instead of its contents and handle formatting inside the playbook to avoid headaches on users that are not familiar with Ansible quirks.

This goes back to what I said earlier about documentation and scoping. I'm not opposed to making this more friendly, but it would help to know who the audience is. The simplest thing would probably be to have people write these settings in a YAML file and then run ansible-playbook -e @my_settings.yml.

I think tools/compute-images/ansible/privilege-nspawn-vm.yml is missing become: true/yes on several (if not all tasks). Running it with -b as you pointed out in the call makes it succeed.

Yeah I'll add this.

In tools/compute-images/ansible/build-debian-nspawn-vm.yml, if no image_passhash is passed, wdyt about giving the admin user password-less sudo access? I think it could be nice for automation purposes. If security is a concern, we could instead make the image_passhash parameter mandatory.

In my testing, at least on Debian 12, it already works this way. If the user provides no password, the account is created with a locked password (the default image_passhash: "!"). Given that the account belongs to the sudo group, it gets permission to use sudo passwordless. I guess sudo itself notices the password is locked and waives the password requirement in this case.

Actions #20

Updated by Lucas Di Pentima about 1 month ago

Brett Smith wrote in #note-19:

For the host stuff: sure. Right now I don't think documentation is in scope because this is "just" a "prototype" ticket. Before I go documenting how to use it more formally, it would help to better scope the audience. Are we literally just deploying this on Curii's Jenkins server? Other internal servers? Is it supposed to be usable by outside contributors for… something? Having those decisions made would help decide what the documentation needs to cover.

I agree, i was just commenting on your detailed instructions note-14, what was my experience following them.

For stuff inside the container, I don't understand why this was necessary because it should already be covered:

  • build-debian-nspawn-vm.yml runs debootstrap with --include=python3 which should preinstall it.
  • install-test-env.yml installs net-tools pretty early.

Can you say more about what the problem you encountered was? In general, I think it's great that you're willing to paper over minor issues to keep trucking on the branch review, but when getting feedback like this, it would really help me to know what the original problem was so I can know what all the options for dealing with it are. Right now I'm at a loss for why the code that's already there didn't accomplish what you say you did.

Yes, sorry. My notes covered multiple passes and commits from your side, so the first bullet point was more a comment on the initial instructions, when the other playbooks didn't existed yet, I should have clarified that.
Given that this is a prototype, documentation isn't really in scope but I figured that it was of some value to mention what additional steps were required to make things work.

  • Is there a way of setting a default value on become to true? I'm not seeing much value in needing to set become: true almost everywhere.

Also note that you can run ansible-playbook with -b to have every task become by default.

Yeah, my original question was more on the line of "why we need to add a become: true line in every task where most of them will require it (since this is an admin automation tool), so I was thinking of having that enabled by default and save a lot of lines on our playbooks. But OTOH you mentioned that you preferred not use become on software building tasks and I agree.

This goes back to what I said earlier about documentation and scoping. I'm not opposed to making this more friendly, but it would help to know who the audience is. The simplest thing would probably be to have people write these settings in a YAML file and then run ansible-playbook -e @my_settings.yml.

Yes, that could be the way to go. As you say, it isn't very clear who the audience is, so as a reviewer it's not easy to put myself in the shoes of someone who is not clearly defined yet. My feedback was mainly based on the struggles experienced while trying to make this work as someone with near zero ansible experience.

In my testing, at least on Debian 12, it already works this way. If the user provides no password, the account is created with a locked password (the default image_passhash: "!"). Given that the account belongs to the sudo group, it gets permission to use sudo passwordless. I guess sudo itself notices the password is locked and waives the password requirement in this case.

I might remember this wrong, but I think that in my case, password-less sudo didn't work with passhash set to '!', maybe because of using Debian 11.

In summary, this The goal is to have an ansible script that spins up VMs that can talk to one another which can be handed off to the installer playbook. objective was achieved in my eyes, so this LGTM. Thanks!

Actions #21

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2025-01-08 to Development 2025-01-29
Actions #22

Updated by Brett Smith about 1 month ago

  • Status changed from In Progress to Resolved
Actions #23

Updated by Peter Amstutz 13 days ago

  • Release set to 75
Actions

Also available in: Atom PDF