Project

General

Profile

Feature #16738

Updated by Ward Vandewege over 3 years ago

a-d-c currently does not take the instance type into consideration when it handles cloud quota. 

 In practice, cloud vendors have a vCPU quota by instance family (Azure) or instance type (AWS). And also by region, of course. 

 That means that if a-d-c hits a quota limit, it will stop starting any new nodes, even if the quota only applies to certain instance types or a specific instance family while a-d-c is configured to also use other instance types. 

 AWS has one combined vCPU limit for "standard" instances, i.e. (A, C, D, H, I, M, R, T, Z) types 
 and a separate vCPU limit each for G, Inf, P, X, and F instances. 

 Azure has a vCPU limit per instance family (e.g. DSv2, Dv2, etc). 

 We need a "quota partition" concept. Node definitions should be assigned a quota partition, and a-d-c should take those partitions into account when dealing with quota. 

 Azure also has a "global" limit on vCPUs across all families, but so far we've never seen them set that limit lower than the sum of all the limits of the families. Quota seem to be in place to ensure availability of hardware, not to limit spending on a customer's part. 

 It's not clear if we could distinguish quota errors from either situation, but it's probably not worth worrying about: if a-d-c were to hit the global quota, that would result in all the quota partitions getting marked as 'at quota', and that seems appropriate.

Back