Feature #16739
closed[a-d-c] throttle option for concurrent VM create requests
Added by Ward Vandewege over 4 years ago. Updated over 4 years ago.
Description
Add an a-d-c throttle field (not at the driver level) for concurrent create requests, because it may be useful for other providers, too.
Reuse the quota limit mechanism in a-d-c. Default the throttle to 20 for Azure in our config example.
See #16625 for the rationale behind this.
Updated by Ward Vandewege over 4 years ago
- Related to Feature #16625: Support creating VMs from Azure "image" resource not just bare VHD added
Updated by Ward Vandewege over 4 years ago
Ready for review at baa1f256924655d67b704f35981e9839743fab99 on branch 16739-concurrent-node-create-throttle
Updated by Ward Vandewege over 4 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 4 years ago
I think you could skip CheckRateLimitError (which would log a slightly misleading message "suspending remote calls due to rate-limit error") and do this
logger.Info("reached MaxConcurrentNodeCreateOps") wp.instanceSet.throttleCreate.ErrorUntil(errors.New("reached MaxConcurrentNodeCreateOps"), time.Now().Add(5*time.Second), wp.notify) return false
...also eliminating the new quotaerror type.
Log message should probably use the snakecase found in config since that's what admin will need to look for.
How about "MaxConcurrentInstanceCreateOps" instead of ...NodeCreateOps? I think we should stick with one term for a given thing in config keys & docs -- the section is called CloudVMs and other config keys use the term "instances", which is probably best since Azure/GCP/AWS all refer to "virtual machine instances".
(Some of the comments already say "workers" where they should probably say "instances" instead... but at least they're just comments.)
Nit: The "(azure)" stuff in the config-defaults comments could be misconstrued to mean that the feature only applies to Azure. Maybe more explicit, like "This is recommended by Azure in certain scenarios (see {link}) and can be used with other cloud providers too, if desired."?
Meanwhile, in existing code... I noticed the worker struct had its own throttleCreate and throttleInstances, unused except for a bogus call to wp.throttleCreate.Error() in Create() that clearly should have been checking wp.instanceSet.throttleCreate.Error() instead. Pushed a fix to your branch.
Rest LGTM. I like the test!
Updated by Ward Vandewege over 4 years ago
Tom Clegg wrote:
I think you could skip CheckRateLimitError (which would log a slightly misleading message "suspending remote calls due to rate-limit error") and do this
[...]
...also eliminating the new quotaerror type.
Ah, nice, I didn't like having to add that type.
Log message should probably use the snakecase found in config since that's what admin will need to look for.
Yes!
How about "MaxConcurrentInstanceCreateOps" instead of ...NodeCreateOps? I think we should stick with one term for a given thing in config keys & docs -- the section is called CloudVMs and other config keys use the term "instances", which is probably best since Azure/GCP/AWS all refer to "virtual machine instances".
Sure, done. Consistency is good.
(Some of the comments already say "workers" where they should probably say "instances" instead... but at least they're just comments.)
I didn't fix that here, not sure which comments you meant.
Nit: The "(azure)" stuff in the config-defaults comments could be misconstrued to mean that the feature only applies to Azure. Maybe more explicit, like "This is recommended by Azure in certain scenarios (see {link}) and can be used with other cloud providers too, if desired."?
Yes, that's better, fixed.
Meanwhile, in existing code... I noticed the worker struct had its own throttleCreate and throttleInstances, unused except for a bogus call to wp.throttleCreate.Error() in Create() that clearly should have been checking wp.instanceSet.throttleCreate.Error() instead. Pushed a fix to your branch.
Thanks!
Rest LGTM. I like the test!
Cool. The change to throttleCreate.ErrorUntil meant that I needed a way to reset the error in the test, so I added a ResetError() function on throttle.
Ready for another look at a5fef23f2863cd0183ff596f4579110e2ddb3b3d on branch 16739-concurrent-node-create-throttle
Updated by Ward Vandewege over 4 years ago
Cool. The change to throttleCreate.ErrorUntil meant that I needed a way to reset the error in the test, so I added a ResetError() function on throttle.
I removed that function in b108bdfa0f3c74239fa565a1d14db945eb4dcf18 and set the err value directly in the test.
Updated by Tom Clegg over 4 years ago
a-d-c install doc page needs s/MaxConcurrentNodeCreateOps/MaxConcurrentInstanceCreateOps/
Rest LGTM, thanks!
Updated by Ward Vandewege over 4 years ago
Tom Clegg wrote:
a-d-c install doc page needs
s/MaxConcurrentNodeCreateOps/MaxConcurrentInstanceCreateOps/
Rest LGTM, thanks!
Fixed and merged, thanks.
Updated by Ward Vandewege over 4 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|0ff4ed45a7ab1730118eadfb92ddea7d332f0328.