Project

General

Profile

Actions

Bug #20474

closed

Ensuring metrics are available during busy times

Added by Peter Amstutz 11 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
0.5
Release relationship:
Auto

Description

Nginx configuration should be API server request limit + a little extra, because the metrics requests are not subject to the API request limit but we need to leave some headroom.


Subtasks 1 (0 open1 closed)

Task #20534: Review 20474-installer-api-reqs-limitResolvedLucas Di Pentima05/22/2023Actions
Actions #1

Updated by Peter Amstutz 11 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 11 months ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg 11 months ago

  • Assigned To set to Tom Clegg
Actions #4

Updated by Tom Clegg 11 months ago

  • Assigned To changed from Tom Clegg to Lucas Di Pentima

Added a check in 3faf9753f to confirm that service processes continue to handle incoming /metrics and /_inspect/requests when normal requests are being rejected due to MaxConcurrentRequests.

Still need to check that the downstream Nginx config isn't preventing them from getting through to the arvados service processes in the first place.

Actions #5

Updated by Peter Amstutz 11 months ago

  • Target version changed from Development 2023-05-10 sprint to Development 2023-05-24 sprint
Actions #6

Updated by Lucas Di Pentima 10 months ago

Updates at e8a6dc77f - branch 20474-installer-api-reqs-limit

These changes affect the multi host salt installer:

  • Adds 5 additional requests to RailsAPI's request queue when CONTROLLER_MAX_CONCURRENT_REQUESTS is set on local.params.
  • Also, does some basic type checking.

I've done manual testing on a test cluster to confirm this works.

Actions #7

Updated by Lucas Di Pentima 10 months ago

  • Story points set to 0.5
Actions #8

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2023-05-24 sprint to Development 2023-06-07
Actions #9

Updated by Brett Smith 10 months ago

Lucas Di Pentima wrote in #note-6:

Updates at e8a6dc77f - branch 20474-installer-api-reqs-limit

I'm not convinced we need this. My current understanding is:

  • passenger_max_request_queue_size only affects services that have Passenger enabled.
  • The only server that has passenger_enabled on is RailsAPI. No other service sets that, and the default is off, so they should be unaffected by Passenger or its configuration at all.
  • I don't see us setting any other nginx configuration that would limit incoming connections.

That said, this is all new to me, so maybe there's something I'm missing? Did you specifically test before the branch whether a change is necessary to allow metrics connections beyond MaxConcurrentRequests? Can you say more about the testing process in general?

Thanks.

Actions #10

Updated by Lucas Di Pentima 10 months ago

My understanding is the following:

The MaxConcurrentRequests config knob controls the controller service request queue size, while the passenger_max_request_queue_size config from nginx controls the same thing on the RailsAPI side.
As the majority of the request that comes from clients on the controller service translates to a subsequent request from controller to RailsAPI, this was set on a 1:1 ratio, but if we also want to make additional requests directly to RailsAPI (for other purposes like retrieving its metrics), we need some slack on RailsAPI's request queue, so that when controller's getting hammered by clients and cannot queue any more requests, RailsAPI can accept some additional ones.

Does this make sense or maybe I'm misunderstanding your feedback?

Actions #11

Updated by Brett Smith 10 months ago

Lucas Di Pentima wrote in #note-10:

As the majority of the request that comes from clients on the controller service translates to a subsequent request from controller to RailsAPI, this was set on a 1:1 ratio, but if we also want to make additional requests directly to RailsAPI (for other purposes like retrieving its metrics), we need some slack on RailsAPI's request queue, so that when controller's getting hammered by clients and cannot queue any more requests, RailsAPI can accept some additional ones.

Does this make sense or maybe I'm misunderstanding your feedback?

I think we're not on the same page about the functional requirements of this ticket. The devs made a change to ensure that requests to /metrics and /_inspect/requests specifically are not counted against MaxConcurrentRequests. My understanding is this ticket is about making sure that nginx does not do its own enforcement of that limit.

I'm saying, I think that's already the case in current main. RailsAPI does not support either of these endpoints, controller handles them internally, so there is no need to increase the Passenger queue size limit to accommodate these specific requests. And I don't see anything else in the current nginx configuration that would put a limit on controller or other servers that do support the endpoint: Passenger queue size limit shouldn't affect other servers, and there's no other nginx configuration that would limit incoming requests.

If you understood the ticket differently we might need to get clarification from Peter or Tom.

Actions #12

Updated by Brett Smith 10 months ago

Brett Smith wrote in #note-11:

RailsAPI does not support either of these endpoints, controller handles them internally, so there is no need to increase the Passenger queue size limit to accommodate these specific requests.

This is what the Admin guide docs say but those docs are out-of-date. RailsAPI support for /metrics was added a year ago. With that, the changes in the branch make sense, and then my comments are:

  • Adding a comment near the Passenger request queue limit to explain that we add this overhead for metrics seems like a nice affordance, given how much time we've spent on it and it could easily just seem like "generic overhead" otherwise.
  • I don't feel too strongly about this, I wouldn't hold up a merge over it, but I do think it would be nice to set the overhead as a percentage of max_reqs rather than a static 5, that way it scales nicely to different sizes of cluster. If it was me I'd probably do 10%? But anything 5%-25% seems reasonable.

Thanks.

Actions #13

Updated by Lucas Di Pentima 10 months ago

Updates at 61a43214b

  • Added explaining comment on nginx's passenger config file.
  • Changed passenger's queue size from adding 5 to adding 10%.
  • Fixed documentation adding a checkmark for RailsAPI's /metrics support.
Actions #14

Updated by Brett Smith 10 months ago

Lucas Di Pentima wrote in #note-13:

Updates at 61a43214b

Thanks, this looks good to me.

Actions #15

Updated by Lucas Di Pentima 10 months ago

  • Status changed from In Progress to Resolved
Actions #16

Updated by Peter Amstutz 7 months ago

  • Release set to 66
Actions

Also available in: Atom PDF