Project

General

Profile

Actions

Feature #16739

closed

[a-d-c] throttle option for concurrent VM create requests

Added by Ward Vandewege over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

Add an a-d-c throttle field (not at the driver level) for concurrent create requests, because it may be useful for other providers, too.

Reuse the quota limit mechanism in a-d-c. Default the throttle to 20 for Azure in our config example.

See #16625 for the rationale behind this.


Subtasks 1 (0 open1 closed)

Task #16769: Review 16739-concurrent-node-create-throttleResolvedWard Vandewege08/31/2020Actions

Related issues

Related to Arvados - Feature #16625: Support creating VMs from Azure "image" resource not just bare VHDResolvedWard Vandewege08/17/2020Actions
Actions #1

Updated by Ward Vandewege over 3 years ago

  • Related to Feature #16625: Support creating VMs from Azure "image" resource not just bare VHD added
Actions #2

Updated by Tom Clegg over 3 years ago

  • Assigned To set to Ward Vandewege
Actions #3

Updated by Ward Vandewege over 3 years ago

Ready for review at baa1f256924655d67b704f35981e9839743fab99 on branch 16739-concurrent-node-create-throttle

Actions #4

Updated by Ward Vandewege over 3 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Tom Clegg over 3 years ago

I think you could skip CheckRateLimitError (which would log a slightly misleading message "suspending remote calls due to rate-limit error") and do this

logger.Info("reached MaxConcurrentNodeCreateOps")
wp.instanceSet.throttleCreate.ErrorUntil(errors.New("reached MaxConcurrentNodeCreateOps"), time.Now().Add(5*time.Second), wp.notify)
return false

...also eliminating the new quotaerror type.

Log message should probably use the snakecase found in config since that's what admin will need to look for.

How about "MaxConcurrentInstanceCreateOps" instead of ...NodeCreateOps? I think we should stick with one term for a given thing in config keys & docs -- the section is called CloudVMs and other config keys use the term "instances", which is probably best since Azure/GCP/AWS all refer to "virtual machine instances".

(Some of the comments already say "workers" where they should probably say "instances" instead... but at least they're just comments.)

Nit: The "(azure)" stuff in the config-defaults comments could be misconstrued to mean that the feature only applies to Azure. Maybe more explicit, like "This is recommended by Azure in certain scenarios (see {link}) and can be used with other cloud providers too, if desired."?

Meanwhile, in existing code... I noticed the worker struct had its own throttleCreate and throttleInstances, unused except for a bogus call to wp.throttleCreate.Error() in Create() that clearly should have been checking wp.instanceSet.throttleCreate.Error() instead. Pushed a fix to your branch.

Rest LGTM. I like the test!

Actions #6

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

I think you could skip CheckRateLimitError (which would log a slightly misleading message "suspending remote calls due to rate-limit error") and do this

[...]

...also eliminating the new quotaerror type.

Ah, nice, I didn't like having to add that type.

Log message should probably use the snakecase found in config since that's what admin will need to look for.

Yes!

How about "MaxConcurrentInstanceCreateOps" instead of ...NodeCreateOps? I think we should stick with one term for a given thing in config keys & docs -- the section is called CloudVMs and other config keys use the term "instances", which is probably best since Azure/GCP/AWS all refer to "virtual machine instances".

Sure, done. Consistency is good.

(Some of the comments already say "workers" where they should probably say "instances" instead... but at least they're just comments.)

I didn't fix that here, not sure which comments you meant.

Nit: The "(azure)" stuff in the config-defaults comments could be misconstrued to mean that the feature only applies to Azure. Maybe more explicit, like "This is recommended by Azure in certain scenarios (see {link}) and can be used with other cloud providers too, if desired."?

Yes, that's better, fixed.

Meanwhile, in existing code... I noticed the worker struct had its own throttleCreate and throttleInstances, unused except for a bogus call to wp.throttleCreate.Error() in Create() that clearly should have been checking wp.instanceSet.throttleCreate.Error() instead. Pushed a fix to your branch.

Thanks!

Rest LGTM. I like the test!

Cool. The change to throttleCreate.ErrorUntil meant that I needed a way to reset the error in the test, so I added a ResetError() function on throttle.

Ready for another look at a5fef23f2863cd0183ff596f4579110e2ddb3b3d on branch 16739-concurrent-node-create-throttle

Actions #7

Updated by Ward Vandewege over 3 years ago

Cool. The change to throttleCreate.ErrorUntil meant that I needed a way to reset the error in the test, so I added a ResetError() function on throttle.

I removed that function in b108bdfa0f3c74239fa565a1d14db945eb4dcf18 and set the err value directly in the test.

Actions #8

Updated by Tom Clegg over 3 years ago

a-d-c install doc page needs s/MaxConcurrentNodeCreateOps/MaxConcurrentInstanceCreateOps/

Rest LGTM, thanks!

Actions #9

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

a-d-c install doc page needs s/MaxConcurrentNodeCreateOps/MaxConcurrentInstanceCreateOps/

Rest LGTM, thanks!

Fixed and merged, thanks.

Actions #10

Updated by Ward Vandewege over 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #11

Updated by Peter Amstutz over 3 years ago

  • Release set to 25
Actions

Also available in: Atom PDF