Project

General

Profile

Actions

Idea #20482

closed

Installer's terraform code supports customized AWS deployment

Added by Lucas Di Pentima 12 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Story points:
2.0
Release relationship:
Auto

Description

The current terraform code deploys an infrastructure that includes nodes with public IP addresses for external access, creates its own VPC & subnets, uses a predefined AMI, etc. This can conflict with policies of organizations wanting to use it.

Terraform config should allow the operator to request custom parameters when creating a multi-node Arvados cluster.


Subtasks 3 (0 open3 closed)

Task #20483: Review 20482-terraform-custom-configsResolvedLucas Di Pentima05/10/2023Actions
Task #20515: Review 20482-further-terraform-improvementsResolvedLucas Di Pentima05/16/2023Actions
Task #20523: Review 20482-installer-improvementsResolvedLucas Di Pentima05/19/2023Actions
Actions #1

Updated by Lucas Di Pentima 12 months ago

  • Status changed from New to In Progress
  • Description updated (diff)
Actions #2

Updated by Peter Amstutz 11 months ago

  • Target version changed from Development 2023-05-10 sprint to Development 2023-05-24 sprint
Actions #3

Updated by Lucas Di Pentima 11 months ago

  • Description updated (diff)
  • Subject changed from Installer's terraform code supports private-only AWS deployments to Installer's terraform code supports customized AWS deployment
Actions #4

Updated by Lucas Di Pentima 11 months ago

Updates at 648c8c4 - branch 20482-terraform-custom-configs

This branch has a not small number of commits, I hope I have organized them in an easy to follow way:

  • Fixes the way the keepstore S3 bucket is created, due to changes in AWS's defaults (see: https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3-automatically-enable-block-public-access-disable-access-control-lists-buckets-april-2023/)
  • Allows the site admin to create a non-public Arvados cluster: no public IP addresses are requested and no split horizon route53 DNS setup is done.
  • Fixes the way we handle the cluster's domain name. This allows the configuration to support other name shapes than "cluster_prefix.domain".
  • Allows setting custom VPC, subnets & security group instead of creating its own set.
  • Allows setting a custom deploy user name.
  • Implements authorized SSH keys without using AWS key pairs to improve security.
  • Allows using a custom AMI for the service nodes instead of Debian 11. Still retaining Debian as a default.
  • Allows using a custom set of tags for every resource.
  • Exposes Private IP addresses & DNS aliases configs as variables.
  • Adds a proper compute node instance profile that gives keepstore's S3 bucket access in addition to the needed permissions for the EBS Autoscaler to work.
Actions #5

Updated by Lucas Di Pentima 11 months ago

Updates at d73105e0d

Just a couple fixes:

  • Set vpc/'s output related to letsencrypt iam key id to sensitive.
  • Fixes the PassRole policy's target to point to the newly created compute node instance profile.

Manually tested the new installer from scratch with defaults, setting the 3 main config values. Diagnostics run was OK.

Actions #6

Updated by Lucas Di Pentima 11 months ago

Updates at c311969bd

Some more improvements:

  • Allows specifying the instance type & volume size per service node.
  • Improves the instance profile assignment code for readability.
Actions #7

Updated by Lucas Di Pentima 11 months ago

Updates at 1cabb7a57

  • Sets required terraform & AWS provider versions to avoid incompatibility issues.
Actions #8

Updated by Brett Smith 11 months ago

Reviewing @ c311969bdd03f411a202983e7a0a11f4d9901243. Thank you for the commit organization, it really was a huge help. Each commit felt pretty easy to follow individually.

  • For having a private-only cluster setting, what do you think about getting rid of the private_only variable, and then determining whether or not the cluster is private by checking whether user_facing_hosts is empty? That seems to be the main job of private_only, is to force user_facing_hosts to be treated the same as internal_service_hosts. Getting rid the variable could help avoid a situation where user_facing_hosts doesn't get interpreted the way an adminsitrator expects because they're not looking at/thinking about private_only.
  • It looks like the format of domain_name is changing. Is it really, or is the example just meant to be more illustrative? If the format really is changing, could the change affect existing users? Does it need to be documented?
  • For the SSH key install, you can install it more securely by using install to create files and directories with the right permissions immediately, rather than patching them up after the fact. Try: install -d -o ${deploy_user} -g ${deploy_user} -m 700 $${SSH_DIR} and echo "${ssh_pubkey}" | install -o ${deploy_user} -g ${deploy_user} -m 600 /dev/stdin $${SSH_DIR}/authorized_keys

    Even if you don't take this suggestion, I think you might need to set the ownership of the authorized_keys file at leest, similarly to the way you do for the .ssh directory now.

Thanks.

Actions #9

Updated by Lucas Di Pentima 11 months ago

Updates at 29725dfdd

Brett Smith wrote in #note-8:

  • For having a private-only cluster setting, what do you think about getting rid of the private_only variable, and then determining whether or not the cluster is private by checking whether user_facing_hosts is empty? That seems to be the main job of private_only, is to force user_facing_hosts to be treated the same as internal_service_hosts. Getting rid the variable could help avoid a situation where user_facing_hosts doesn't get interpreted the way an adminsitrator expects because they're not looking at/thinking about private_only.

As I mentioned on chat, private_only mainly aims to control the Public IP addr creation and assignments & Route53 public zone.

  • It looks like the format of domain_name is changing. Is it really, or is the example just meant to be more illustrative? If the format really is changing, could the change affect existing users? Does it need to be documented?

Yes, thanks for the remainder. I've added an upgrade note for it, although we don't have an official way of upgrading the installer, I think it'll be useful.

  • For the SSH key install, you can install it more securely by using install to create files and directories with the right permissions immediately, rather than patching them up after the fact. Try: install -d -o ${deploy_user} -g ${deploy_user} -m 700 $${SSH_DIR} and echo "${ssh_pubkey}" | install -o ${deploy_user} -g ${deploy_user} -m 600 /dev/stdin $${SSH_DIR}/authorized_keys

I keep forgetting about the install command , thanks for the reminder! I've updated the user_data script following your suggestions.

Actions #10

Updated by Brett Smith 11 months ago

Lucas Di Pentima wrote in #note-9:

Yes, thanks for the remainder. I've added an upgrade note for it, although we don't have an official way of upgrading the installer, I think it'll be useful.

Thanks for adding this. The headline is a little generic, what do you think about mentioning the setting, something like "Multi-node installer domain_name configuration change"?

This is good to merge, thanks.

Actions #11

Updated by Lucas Di Pentima 11 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #12

Updated by Lucas Di Pentima 11 months ago

  • Status changed from Resolved to In Progress

Have some additional fixes on terraform at a41f906 - branch 20482-further-terraform-improvements

  • Syntax fixes on variable definition files.
  • Avoid creating an S3 endpoint when specifying a VPC. It's assumed that the preexisting VPC will already have its S3 access configured.
  • Make the installer.sh script compatible with Ubuntu 18.04 by not using newer git version options.
Actions #13

Updated by Brett Smith 11 months ago

Lucas Di Pentima wrote in #note-12:

Have some additional fixes on terraform at a41f906 - branch 20482-further-terraform-improvements

This is good to merge, thanks.

Actions #14

Updated by Lucas Di Pentima 11 months ago

Updates at 7c048d9 - branch 20482-installer-improvements
Test run: developer-run-tests: #3653

  • Overrides arvados-formula's shellinabox config templates to allow using arbitrary domains for the cluster.
  • Updates salt pillars to avoid always sticking the cluster prefix on the cluster domain.
  • Updates installer.sh, provision.sh and local.params so that they handle ${DOMAIN} as the full cluster base domain.
  • Fixes Terraform code by exporting the previously removed vpc_cidr output, with a more correct name: cluster_int_cidr.
  • Updates the multi host installer docs to reflect recent changes.
Actions #15

Updated by Brett Smith 11 months ago

Lucas Di Pentima wrote in #note-14:

Updates at 7c048d9 - branch 20482-installer-improvements

Should we add an upgrade note about the change to DOMAIN like we did for Terraform? I realize the Salt installer isn't intended to be re-run the same way that Terraform is, but it's not impossible to imagine someone has integrated it into some larger install process. Even just adding a quick sentence or two about this to the existing section would be fine, I think.

"For public long-lived clusters, we recommend reserving a cluster id." - I admittedly don't know much about our cluster id reservation process, so because of that, I'm not sure I understand why adding the word "public" is a good change. Clusters that start private could become public in the future. Wouldn't it be good to have their ids reserved to avoid potential collisions when that happens?

I'm not sure why this branch needs to make changes to .licenseignore. The Terraform vars includes already have license notices at the top. The Jinja templates seem like they could have license notices by having a block comment at the top. Is there some reason that's not a good idea?

Actions #16

Updated by Lucas Di Pentima 11 months ago

Updates at 7b73360 (rebased branch)

Brett Smith wrote in #note-15:

Should we add an upgrade note about the change to DOMAIN like we did for Terraform? I realize the Salt installer isn't intended to be re-run the same way that Terraform is, but it's not impossible to imagine someone has integrated it into some larger install process. Even just adding a quick sentence or two about this to the existing section would be fine, I think.

Good point, I've updated the upgrade notice.

"For public long-lived clusters, we recommend reserving a cluster id." - I admittedly don't know much about our cluster id reservation process, so because of that, I'm not sure I understand why adding the word "public" is a good change. Clusters that start private could become public in the future. Wouldn't it be good to have their ids reserved to avoid potential collisions when that happens?

Yeah, it seemed to me a trivial doc fix, but given that there might be other point of view, I removed it and left it like it was before.

I'm not sure why this branch needs to make changes to .licenseignore. The Terraform vars includes already have license notices at the top. The Jinja templates seem like they could have license notices by having a block comment at the top. Is there some reason that's not a good idea?

Thanks, I've added the commented copyright notices, and double-checked they work by deploying a test cluster. For the case of the Terraform var symlinked files, there's some kind of issue on the copyright script: it should ignore the symlinks but I don't yet understand why it isn't in this case. I didn't want to waste more time on that for the moment so I kept those files on licenseignore.

Actions #17

Updated by Brett Smith 11 months ago

Lucas Di Pentima wrote in #note-16:

Updates at 7b73360 (rebased branch)

Thanks for the changes. This lgtm.

Actions #18

Updated by Lucas Di Pentima 11 months ago

  • % Done changed from 66 to 100
  • Status changed from In Progress to Resolved
Actions #20

Updated by Peter Amstutz 8 months ago

  • Release set to 66
Actions

Also available in: Atom PDF