Project

General

Profile

Actions

Bug #18101

closed

[a-d-c] [AWS] add option to spin up (spot) instances in more/all availability zones in the region

Added by Ward Vandewege over 2 years ago. Updated 3 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-
Release relationship:
Auto

Description

When using spot instances on AWS, it is common to see a message like this in the a-d-c logs:

InsufficientInstanceCapacity: We currently do not have sufficient m5.8xlarge capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get m5.8xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1c, us-east-1d, us-east-1f.

Currently, a-d-c requests compute instances with a specific subnet, which is tied to one availability zone, and we recommend that that zone is the same as the one the keepstores run in.

Traffic between availability zones in the same AWS region costs $0.02/GB (cf. https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer_within_the_same_AWS_Region).

Once #16516 (run Keepstore on the compute node) is implemented, it will be advantageous to configure a cluster on AWS where (spot) instances are requested across multiple (all?) availability zones in a region. When a spot instance runs in a different AZ, there would be an extra cost of $0.02/GB for all traffic to/from the permanent EC2 instances (e.g. API server), but that traffic should be minimal (mostly crunchstat-summary log traffic).

The Arvados configuration should support multiple subnets:

CloudVMs:
  Driver: ec2
  DriverParameters:
    SubnetIDs: ['subnet-...', 'subnet-...']

Alternatively, it would be nice if we could pass no AZ in the request; I'm not sure how that would work in the AWS sdk, presumably you would still have to supply a desired subnet. This needs a bit of investigation.


Related issues

Related to Arvados Epics - Idea #18179: Better spot instance supportIn Progress03/01/202206/30/2024Actions
Actions

Also available in: Atom PDF