Bug #20474
closedEnsuring metrics are available during busy times
Description
Nginx configuration should be API server request limit + a little extra, because the metrics requests are not subject to the API request limit but we need to leave some headroom.
Updated by Tom Clegg over 1 year ago
- Assigned To changed from Tom Clegg to Lucas Di Pentima
Added a check in 3faf9753f to confirm that service processes continue to handle incoming /metrics and /_inspect/requests when normal requests are being rejected due to MaxConcurrentRequests.
Still need to check that the downstream Nginx config isn't preventing them from getting through to the arvados service processes in the first place.
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-05-10 sprint to Development 2023-05-24 sprint
Updated by Lucas Di Pentima over 1 year ago
Updates at e8a6dc77f - branch 20474-installer-api-reqs-limit
These changes affect the multi host salt installer:
- Adds 5 additional requests to RailsAPI's request queue when
CONTROLLER_MAX_CONCURRENT_REQUESTS
is set onlocal.params
. - Also, does some basic type checking.
I've done manual testing on a test cluster to confirm this works.
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-05-24 sprint to Development 2023-06-07
Updated by Brett Smith over 1 year ago
Lucas Di Pentima wrote in #note-6:
Updates at e8a6dc77f - branch
20474-installer-api-reqs-limit
I'm not convinced we need this. My current understanding is:
passenger_max_request_queue_size
only affects services that have Passenger enabled.- The only server that has
passenger_enabled on
is RailsAPI. No other service sets that, and the default is off, so they should be unaffected by Passenger or its configuration at all. - I don't see us setting any other nginx configuration that would limit incoming connections.
That said, this is all new to me, so maybe there's something I'm missing? Did you specifically test before the branch whether a change is necessary to allow metrics connections beyond MaxConcurrentRequests
? Can you say more about the testing process in general?
Thanks.
Updated by Lucas Di Pentima over 1 year ago
My understanding is the following:
The MaxConcurrentRequests
config knob controls the controller
service request queue size, while the passenger_max_request_queue_size
config from nginx
controls the same thing on the RailsAPI side.
As the majority of the request that comes from clients on the controller
service translates to a subsequent request from controller
to RailsAPI
, this was set on a 1:1 ratio, but if we also want to make additional requests directly to RailsAPI (for other purposes like retrieving its metrics), we need some slack on RailsAPI's request queue, so that when controller's getting hammered by clients and cannot queue any more requests, RailsAPI can accept some additional ones.
Does this make sense or maybe I'm misunderstanding your feedback?
Updated by Brett Smith over 1 year ago
Lucas Di Pentima wrote in #note-10:
As the majority of the request that comes from clients on the
controller
service translates to a subsequent request fromcontroller
toRailsAPI
, this was set on a 1:1 ratio, but if we also want to make additional requests directly to RailsAPI (for other purposes like retrieving its metrics), we need some slack on RailsAPI's request queue, so that when controller's getting hammered by clients and cannot queue any more requests, RailsAPI can accept some additional ones.Does this make sense or maybe I'm misunderstanding your feedback?
I think we're not on the same page about the functional requirements of this ticket. The devs made a change to ensure that requests to /metrics
and /_inspect/requests
specifically are not counted against MaxConcurrentRequests
. My understanding is this ticket is about making sure that nginx does not do its own enforcement of that limit.
I'm saying, I think that's already the case in current main. RailsAPI does not support either of these endpoints, controller handles them internally, so there is no need to increase the Passenger queue size limit to accommodate these specific requests. And I don't see anything else in the current nginx configuration that would put a limit on controller or other servers that do support the endpoint: Passenger queue size limit shouldn't affect other servers, and there's no other nginx configuration that would limit incoming requests.
If you understood the ticket differently we might need to get clarification from Peter or Tom.
Updated by Brett Smith over 1 year ago
Brett Smith wrote in #note-11:
RailsAPI does not support either of these endpoints, controller handles them internally, so there is no need to increase the Passenger queue size limit to accommodate these specific requests.
This is what the Admin guide docs say but those docs are out-of-date. RailsAPI support for /metrics
was added a year ago. With that, the changes in the branch make sense, and then my comments are:
- Adding a comment near the Passenger request queue limit to explain that we add this overhead for metrics seems like a nice affordance, given how much time we've spent on it and it could easily just seem like "generic overhead" otherwise.
- I don't feel too strongly about this, I wouldn't hold up a merge over it, but I do think it would be nice to set the overhead as a percentage of
max_reqs
rather than a static 5, that way it scales nicely to different sizes of cluster. If it was me I'd probably do 10%? But anything 5%-25% seems reasonable.
Thanks.
Updated by Lucas Di Pentima over 1 year ago
Updates at 61a43214b
- Added explaining comment on nginx's passenger config file.
- Changed passenger's queue size from adding 5 to adding 10%.
- Fixed documentation adding a checkmark for RailsAPI's
/metrics
support.
Updated by Brett Smith over 1 year ago
Updated by Lucas Di Pentima over 1 year ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|0e014c36f3a1dc3a24143804deec2a3f3869ee63.