Idea #20482
closedInstaller's terraform code supports customized AWS deployment
Added by Lucas Di Pentima over 1 year ago. Updated about 1 year ago.
Description
The current terraform code deploys an infrastructure that includes nodes with public IP addresses for external access, creates its own VPC & subnets, uses a predefined AMI, etc. This can conflict with policies of organizations wanting to use it.
Terraform config should allow the operator to request custom parameters when creating a multi-node Arvados cluster.
Updated by Lucas Di Pentima over 1 year ago
- Status changed from New to In Progress
- Description updated (diff)
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-05-10 sprint to Development 2023-05-24 sprint
Updated by Lucas Di Pentima over 1 year ago
- Description updated (diff)
- Subject changed from Installer's terraform code supports private-only AWS deployments to Installer's terraform code supports customized AWS deployment
Updated by Lucas Di Pentima over 1 year ago
Updates at 648c8c4 - branch 20482-terraform-custom-configs
This branch has a not small number of commits, I hope I have organized them in an easy to follow way:
- Fixes the way the keepstore S3 bucket is created, due to changes in AWS's defaults (see: https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3-automatically-enable-block-public-access-disable-access-control-lists-buckets-april-2023/)
- Allows the site admin to create a non-public Arvados cluster: no public IP addresses are requested and no split horizon route53 DNS setup is done.
- Fixes the way we handle the cluster's domain name. This allows the configuration to support other name shapes than "cluster_prefix.domain".
- Allows setting custom VPC, subnets & security group instead of creating its own set.
- Allows setting a custom deploy user name.
- Implements authorized SSH keys without using AWS key pairs to improve security.
- Allows using a custom AMI for the service nodes instead of Debian 11. Still retaining Debian as a default.
- Allows using a custom set of tags for every resource.
- Exposes Private IP addresses & DNS aliases configs as variables.
- Adds a proper compute node instance profile that gives keepstore's S3 bucket access in addition to the needed permissions for the EBS Autoscaler to work.
Updated by Lucas Di Pentima over 1 year ago
Updates at d73105e0d
Just a couple fixes:
- Set
vpc/
's output related to letsencrypt iam key id tosensitive
. - Fixes the
PassRole
policy's target to point to the newly created compute node instance profile.
Manually tested the new installer from scratch with defaults, setting the 3 main config values. Diagnostics run was OK.
Updated by Lucas Di Pentima over 1 year ago
Updates at c311969bd
Some more improvements:
- Allows specifying the instance type & volume size per service node.
- Improves the instance profile assignment code for readability.
Updated by Lucas Di Pentima over 1 year ago
Updates at 1cabb7a57
- Sets required terraform & AWS provider versions to avoid incompatibility issues.
Updated by Brett Smith over 1 year ago
Reviewing @ c311969bdd03f411a202983e7a0a11f4d9901243. Thank you for the commit organization, it really was a huge help. Each commit felt pretty easy to follow individually.
- For having a private-only cluster setting, what do you think about getting rid of the
private_only
variable, and then determining whether or not the cluster is private by checking whetheruser_facing_hosts
is empty? That seems to be the main job ofprivate_only
, is to forceuser_facing_hosts
to be treated the same asinternal_service_hosts
. Getting rid the variable could help avoid a situation whereuser_facing_hosts
doesn't get interpreted the way an adminsitrator expects because they're not looking at/thinking aboutprivate_only
.
- It looks like the format of
domain_name
is changing. Is it really, or is the example just meant to be more illustrative? If the format really is changing, could the change affect existing users? Does it need to be documented?
- For the SSH key install, you can install it more securely by using
install
to create files and directories with the right permissions immediately, rather than patching them up after the fact. Try:install -d -o ${deploy_user} -g ${deploy_user} -m 700 $${SSH_DIR}
andecho "${ssh_pubkey}" | install -o ${deploy_user} -g ${deploy_user} -m 600 /dev/stdin $${SSH_DIR}/authorized_keys
Even if you don't take this suggestion, I think you might need to set the ownership of the
authorized_keys
file at leest, similarly to the way you do for the.ssh
directory now.
Thanks.
Updated by Lucas Di Pentima over 1 year ago
Updates at 29725dfdd
Brett Smith wrote in #note-8:
- For having a private-only cluster setting, what do you think about getting rid of the
private_only
variable, and then determining whether or not the cluster is private by checking whetheruser_facing_hosts
is empty? That seems to be the main job ofprivate_only
, is to forceuser_facing_hosts
to be treated the same asinternal_service_hosts
. Getting rid the variable could help avoid a situation whereuser_facing_hosts
doesn't get interpreted the way an adminsitrator expects because they're not looking at/thinking aboutprivate_only
.
As I mentioned on chat, private_only
mainly aims to control the Public IP addr creation and assignments & Route53 public zone.
- It looks like the format of
domain_name
is changing. Is it really, or is the example just meant to be more illustrative? If the format really is changing, could the change affect existing users? Does it need to be documented?
Yes, thanks for the remainder. I've added an upgrade note for it, although we don't have an official way of upgrading the installer, I think it'll be useful.
- For the SSH key install, you can install it more securely by using
install
to create files and directories with the right permissions immediately, rather than patching them up after the fact. Try:install -d -o ${deploy_user} -g ${deploy_user} -m 700 $${SSH_DIR}
andecho "${ssh_pubkey}" | install -o ${deploy_user} -g ${deploy_user} -m 600 /dev/stdin $${SSH_DIR}/authorized_keys
I keep forgetting about the install
command , thanks for the reminder! I've updated the user_data script following your suggestions.
Updated by Brett Smith over 1 year ago
Lucas Di Pentima wrote in #note-9:
Yes, thanks for the remainder. I've added an upgrade note for it, although we don't have an official way of upgrading the installer, I think it'll be useful.
Thanks for adding this. The headline is a little generic, what do you think about mentioning the setting, something like "Multi-node installer domain_name configuration change"?
This is good to merge, thanks.
Updated by Lucas Di Pentima over 1 year ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|ab9fdb73256fb57a48aad2161fa022146031f7ca.
Updated by Lucas Di Pentima over 1 year ago
- Status changed from Resolved to In Progress
Have some additional fixes on terraform at a41f906 - branch 20482-further-terraform-improvements
- Syntax fixes on variable definition files.
- Avoid creating an S3 endpoint when specifying a VPC. It's assumed that the preexisting VPC will already have its S3 access configured.
- Make the
installer.sh
script compatible with Ubuntu 18.04 by not using newergit
version options.
Updated by Brett Smith over 1 year ago
Updated by Lucas Di Pentima over 1 year ago
Updates at 7c048d9 - branch 20482-installer-improvements
Test run: developer-run-tests: #3653
- Overrides arvados-formula's
shellinabox
config templates to allow using arbitrary domains for the cluster. - Updates salt pillars to avoid always sticking the cluster prefix on the cluster domain.
- Updates
installer.sh
,provision.sh
andlocal.params
so that they handle${DOMAIN}
as the full cluster base domain. - Fixes Terraform code by exporting the previously removed
vpc_cidr
output, with a more correct name:cluster_int_cidr
. - Updates the multi host installer docs to reflect recent changes.
Updated by Brett Smith over 1 year ago
Lucas Di Pentima wrote in #note-14:
Updates at 7c048d9 - branch
20482-installer-improvements
Should we add an upgrade note about the change to DOMAIN
like we did for Terraform? I realize the Salt installer isn't intended to be re-run the same way that Terraform is, but it's not impossible to imagine someone has integrated it into some larger install process. Even just adding a quick sentence or two about this to the existing section would be fine, I think.
"For public long-lived clusters, we recommend reserving a cluster id." - I admittedly don't know much about our cluster id reservation process, so because of that, I'm not sure I understand why adding the word "public" is a good change. Clusters that start private could become public in the future. Wouldn't it be good to have their ids reserved to avoid potential collisions when that happens?
I'm not sure why this branch needs to make changes to .licenseignore
. The Terraform vars includes already have license notices at the top. The Jinja templates seem like they could have license notices by having a block comment at the top. Is there some reason that's not a good idea?
Updated by Lucas Di Pentima over 1 year ago
Updates at 7b73360 (rebased branch)
Brett Smith wrote in #note-15:
Should we add an upgrade note about the change to
DOMAIN
like we did for Terraform? I realize the Salt installer isn't intended to be re-run the same way that Terraform is, but it's not impossible to imagine someone has integrated it into some larger install process. Even just adding a quick sentence or two about this to the existing section would be fine, I think.
Good point, I've updated the upgrade notice.
"For public long-lived clusters, we recommend reserving a cluster id." - I admittedly don't know much about our cluster id reservation process, so because of that, I'm not sure I understand why adding the word "public" is a good change. Clusters that start private could become public in the future. Wouldn't it be good to have their ids reserved to avoid potential collisions when that happens?
Yeah, it seemed to me a trivial doc fix, but given that there might be other point of view, I removed it and left it like it was before.
I'm not sure why this branch needs to make changes to
.licenseignore
. The Terraform vars includes already have license notices at the top. The Jinja templates seem like they could have license notices by having a block comment at the top. Is there some reason that's not a good idea?
Thanks, I've added the commented copyright notices, and double-checked they work by deploying a test cluster. For the case of the Terraform var symlinked files, there's some kind of issue on the copyright script: it should ignore the symlinks but I don't yet understand why it isn't in this case. I didn't want to waste more time on that for the moment so I kept those files on licenseignore.
Updated by Brett Smith over 1 year ago
Updated by Lucas Di Pentima over 1 year ago
- % Done changed from 66 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|763e5bd313592a1c1f161b80bc07c94a49f8fb91.