Feature #16739

[a-d-c] throttle option for concurrent VM create requests

Added by Ward Vandewege about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
08/31/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Add an a-d-c throttle field (not at the driver level) for concurrent create requests, because it may be useful for other providers, too.

Reuse the quota limit mechanism in a-d-c. Default the throttle to 20 for Azure in our config example.

See #16625 for the rationale behind this.


Subtasks

Task #16769: Review 16739-concurrent-node-create-throttleResolvedWard Vandewege


Related issues

Related to Arvados - Feature #16625: Support creating VMs from Azure "image" resource not just bare VHDResolved08/17/2020

Associated revisions

Revision 0ff4ed45
Added by Ward Vandewege about 1 year ago

Merge branch '16739-concurrent-node-create-throttle'

closes #16739

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Ward Vandewege about 1 year ago

  • Related to Feature #16625: Support creating VMs from Azure "image" resource not just bare VHD added

#2 Updated by Tom Clegg about 1 year ago

  • Assigned To set to Ward Vandewege

#3 Updated by Ward Vandewege about 1 year ago

Ready for review at baa1f256924655d67b704f35981e9839743fab99 on branch 16739-concurrent-node-create-throttle

#4 Updated by Ward Vandewege about 1 year ago

  • Status changed from New to In Progress

#5 Updated by Tom Clegg about 1 year ago

I think you could skip CheckRateLimitError (which would log a slightly misleading message "suspending remote calls due to rate-limit error") and do this

logger.Info("reached MaxConcurrentNodeCreateOps")
wp.instanceSet.throttleCreate.ErrorUntil(errors.New("reached MaxConcurrentNodeCreateOps"), time.Now().Add(5*time.Second), wp.notify)
return false

...also eliminating the new quotaerror type.

Log message should probably use the snakecase found in config since that's what admin will need to look for.

How about "MaxConcurrentInstanceCreateOps" instead of ...NodeCreateOps? I think we should stick with one term for a given thing in config keys & docs -- the section is called CloudVMs and other config keys use the term "instances", which is probably best since Azure/GCP/AWS all refer to "virtual machine instances".

(Some of the comments already say "workers" where they should probably say "instances" instead... but at least they're just comments.)

Nit: The "(azure)" stuff in the config-defaults comments could be misconstrued to mean that the feature only applies to Azure. Maybe more explicit, like "This is recommended by Azure in certain scenarios (see {link}) and can be used with other cloud providers too, if desired."?

Meanwhile, in existing code... I noticed the worker struct had its own throttleCreate and throttleInstances, unused except for a bogus call to wp.throttleCreate.Error() in Create() that clearly should have been checking wp.instanceSet.throttleCreate.Error() instead. Pushed a fix to your branch.

Rest LGTM. I like the test!

#6 Updated by Ward Vandewege about 1 year ago

Tom Clegg wrote:

I think you could skip CheckRateLimitError (which would log a slightly misleading message "suspending remote calls due to rate-limit error") and do this

[...]

...also eliminating the new quotaerror type.

Ah, nice, I didn't like having to add that type.

Log message should probably use the snakecase found in config since that's what admin will need to look for.

Yes!

How about "MaxConcurrentInstanceCreateOps" instead of ...NodeCreateOps? I think we should stick with one term for a given thing in config keys & docs -- the section is called CloudVMs and other config keys use the term "instances", which is probably best since Azure/GCP/AWS all refer to "virtual machine instances".

Sure, done. Consistency is good.

(Some of the comments already say "workers" where they should probably say "instances" instead... but at least they're just comments.)

I didn't fix that here, not sure which comments you meant.

Nit: The "(azure)" stuff in the config-defaults comments could be misconstrued to mean that the feature only applies to Azure. Maybe more explicit, like "This is recommended by Azure in certain scenarios (see {link}) and can be used with other cloud providers too, if desired."?

Yes, that's better, fixed.

Meanwhile, in existing code... I noticed the worker struct had its own throttleCreate and throttleInstances, unused except for a bogus call to wp.throttleCreate.Error() in Create() that clearly should have been checking wp.instanceSet.throttleCreate.Error() instead. Pushed a fix to your branch.

Thanks!

Rest LGTM. I like the test!

Cool. The change to throttleCreate.ErrorUntil meant that I needed a way to reset the error in the test, so I added a ResetError() function on throttle.

Ready for another look at a5fef23f2863cd0183ff596f4579110e2ddb3b3d on branch 16739-concurrent-node-create-throttle

#7 Updated by Ward Vandewege about 1 year ago

Cool. The change to throttleCreate.ErrorUntil meant that I needed a way to reset the error in the test, so I added a ResetError() function on throttle.

I removed that function in b108bdfa0f3c74239fa565a1d14db945eb4dcf18 and set the err value directly in the test.

#8 Updated by Tom Clegg about 1 year ago

a-d-c install doc page needs s/MaxConcurrentNodeCreateOps/MaxConcurrentInstanceCreateOps/

Rest LGTM, thanks!

#9 Updated by Ward Vandewege about 1 year ago

Tom Clegg wrote:

a-d-c install doc page needs s/MaxConcurrentNodeCreateOps/MaxConcurrentInstanceCreateOps/

Rest LGTM, thanks!

Fixed and merged, thanks.

#10 Updated by Ward Vandewege about 1 year ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

#11 Updated by Peter Amstutz about 1 year ago

  • Release set to 25

Also available in: Atom PDF