Bug #21285
closedAdd MaxGatewayTunnels config, separate from MaxConcurrentRequests
Added by Tom Clegg 12 months ago. Updated 10 months ago.
Description
Currently N running containers will start N gateway tunnels that occupy N of the MaxConcurrentRequests slots, even though they don't use the resources that MaxConcurrentRequests is meant to protect (mainly RailsAPI and PostgreSQL). Since each one stays open for the entire duration of the respective container, these tunnel connections can easily consume most/all of the MaxConcurrentRequests slots, leaving none for workbench2 or even other API calls from the containers themselves.
To address this, we should add a separate MaxGatewayTunnels config. Incoming gateway_tunnel requests should not occupy MaxConcurrentRequests slots. After reaching the MaxGatewayTunnels limit, additional gateway_tunnel requests should return 503 immediately rather than wait in a queue. Crunch-run should delay and retry when this happens.
Nginx and load balancers will be expected to allow {MaxConcurrentRequests + MaxQueuedRequests + MaxGatewayTunnels} concurrent requests. Documentation and installer should be updated accordingly.
Files
queue-names.png (28.7 KB) queue-names.png | Lucas Di Pentima, 01/09/2024 10:06 PM |
Related issues
Updated by Peter Amstutz 11 months ago
On the topic of tying up request slots --
Requests that rely on external services (e.g. proxying to keep-web, or federation) other than the API server and/or local database should also be tracked and limited separately. There is a known pattern where certain requests to go out to the other service, and then that service contacts the API server back to verify the token, fetch user info, fetch a collection manifest, etc. This can result in deadlock if all the request slots are tied up waiting on external services.
Updated by Tom Clegg 11 months ago
- Related to Bug #21319: Avoid waiting/deadlock when a controller handler performs subrequests against the same controller added
Updated by Tom Clegg 11 months ago
- Assigned To set to Tom Clegg
- Status changed from New to In Progress
21285-max-gw-tunnels @ a5af8485c9658e13c6ef40981138731b0db35a68 -- developer-run-tests: #3978
retry wb2 developer-run-tests-services-workbench2: #118
I reworked httpserver.RequestLimiter to have the caller create a number of RequestQueue objects, and provide a function to assign each incoming request to one of them. Each RequestQueue specifies its own MaxConcurrent, MaxQueue, and MaxQueueTimeForMinPriority, as well as a label used for reporting in metrics.
This means we now have a set of metrics for each queue ("api" and "tunnel"), which can be graphed separately and/or in aggregate.
-arvados_concurrent_requests 54
-arvados_max_concurrent_requests 54
-arvados_max_queued_requests 1
-arvados_queue_delay_seconds_count{priority="normal"} 54
-arvados_queue_timeout_seconds_count{priority="normal"} 0
-arvados_queued_requests{priority="normal"} 1
+arvados_concurrent_requests{queue="api"} 24
+arvados_concurrent_requests{queue="tunnel"} 30
+arvados_max_concurrent_requests{queue="api"} 24
+arvados_max_concurrent_requests{queue="tunnel"} 30
+arvados_max_queued_requests{queue="api"} 1
+arvados_max_queued_requests{queue="tunnel"} 0
+arvados_queue_delay_seconds_count{priority="normal",queue="api"} 24
+arvados_queue_delay_seconds_count{priority="normal",queue="tunnel"} 30
+arvados_queue_timeout_seconds_count{priority="normal",queue="api"} 0
+arvados_queue_timeout_seconds_count{priority="normal",queue="tunnel"} 0
+arvados_queued_requests{priority="normal",queue="api"} 1
+arvados_queued_requests{priority="normal",queue="tunnel"} 0
The "tunnel" queue is used for container shell SSH sessions (client->controller->crunchrun) and container gateway tunnels (crunchrun->controller).
- ✅ All agreed upon points are implemented / addressed.
- New config is
MaxGatewayTunnels
- ✅ Crunch-run should delay and retry if tunnel setup endpoint responds 503 (we already had this - crunch-run waits 5s and retries after any error/disconnection)
- New config is
- ✅ Anything not implemented (discovered or discussed during work) has a follow-up story.
- ✅ Code is tested and passing, both automated and manual, what manual testing was done is described
- ✅ Documentation has been updated.
- ✅ Behaves appropriately at the intended scale (describe intended scale).
- ✅ Considered backwards and forwards compatibility issues between client and server.
- ✅ Follows our coding standards and GUI style guidelines.
- ☐ Need to update Nginx config in salt installer?
Updated by Lucas Di Pentima 11 months ago
- If every running container requires a tunnel slot, I think that the
MaxGatewayTunnels
config would behave as an upper limit to theContainers.CloudVMs.MaxInstances
setting. Or putting it in another way: in the cloud scenario both config knobs are kind of equivalent (without counting the container shell sessions). Is that a correct understanding?- Should we check that
MaxGatewayTunnels
is at least equal to (or greater than)MaxInstances
to avoid surprising behaviors? - Do you think we could simplify configuration by setting
MaxGatewayTunnels
's default to something related to whateverMaxInstances
has?
- Should we check that
- The docs update doesn't include any mention on
a-d-c
's section, is that on purpose? Or maybe this branch is only HPC related?- If containers only use tunnel queue slots on HPC clusters, I think it would be convenient to also mention it on the config file.
- For changes on the installer, I will need to wait for this branch to be merged, so that I can do some testing with dev packages.
- I think the grafana dashboard will also need some tweaking.
Updated by Tom Clegg 11 months ago
Lucas Di Pentima wrote in #note-5:
- If every running container requires a tunnel slot, ...
Sorry, the note I put in the config file was not correct. The crunchrun-to-controller tunnel is not set up when using the cloud dispatcher, so this does only apply on HPC, where MaxInstances doesn't apply. I've updated the note in the config file:
# Maximum number of active gateway tunnel connections. One slot
# is consumed by each "container shell" connection. If using an
# HPC dispatcher (LSF or Slurm), one slot is consumed by each
# running container. These do not count toward
# MaxConcurrentRequests.
MaxGatewayTunnels: 1000
- For changes on the installer, I will need to wait for this branch to be merged, so that I can do some testing with dev packages.
- I think the grafana dashboard will also need some tweaking.
SGTM
21285-max-gw-tunnels @ f0908fd5878be533d1208c6a528459945524256b
Updated by Lucas Di Pentima 11 months ago
Tom Clegg wrote in #note-6:
Sorry, the note I put in the config file was not correct. The crunchrun-to-controller tunnel is not set up when using the cloud dispatcher, so this does only apply on HPC, where MaxInstances doesn't apply. I've updated the note in the config file:
21285-max-gw-tunnels @ f0908fd5878be533d1208c6a528459945524256b
Thanks for the clarification, with that I think 21285-max-gw-tunnels
LGTM.
Updated by Lucas Di Pentima 11 months ago
Updates at ce3c4f8 - branch 21285-installer-updates
- Adds a new optional
CONTROLLER_MAX_GATEWAY_TUNNELS
variable inlocal.params
's performance tuning section. - Adds
API.MaxGatewayTunnels
config with default value set to 1000. - Updates balancer & controller nginx configs to account for this new value.
Manually tested on our AWS sandbox account.
Pending: Grafana dashboard update.
Updated by Lucas Di Pentima 11 months ago
- File queue-names.png queue-names.png added
Update at 9f6c2250f6
- Adds
_{{queue}}
to the labels on the concurrent request graph so that new queues get identified on demand.
Updated by Tom Clegg 10 months ago
My only question is about the rlimit and worker_connections calculations:
worker_rlimit_nofile: {{ (max_reqs * 3 + max_tunnels) * controller_nr }}
worker_rlimit_nofile: {{ max_reqs * 3 + 1 + max_tunnels }}
I think we want to add max_tunnels*3
to nofile here: one fd for the incoming connection + one for the outgoing connection to controller + one from controller back to passenger for a permission check.
Also, questioning the existing logic: does nginx really only use one fd for an incoming connection from controller to passenger/rails? Or does it use one for controller->vhost + one for vhost->upstream? Just wondering whether we should be multiplying by 4 instead of 3 here. AFAIK there is no real reason to be stingy with file descriptors once we're going to the trouble to increase the limit...?
Updated by Lucas Di Pentima 10 months ago
After discussing on chat, I've applied the following diff at ba1937c21e
diff --git a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls
index 25027b8571..485cf9c12f 100644
--- a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls
+++ b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls
@@ -16,9 +16,9 @@ nginx:
### SERVER
server:
config:
- worker_rlimit_nofile: {{ (max_reqs * 3 + max_tunnels) * controller_nr }}
+ worker_rlimit_nofile: {{ (max_reqs + max_tunnels) * 5 * controller_nr }}
events:
- worker_connections: {{ (max_reqs * 3 + max_tunnels) * controller_nr }}
+ worker_connections: {{ (max_reqs + max_tunnels) * 5 * controller_nr }}
### STREAMS
http:
'geo $external_client':
diff --git a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls
index 0655a0db1f..0c9ef1c36e 100644
--- a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls
+++ b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls
@@ -55,9 +55,10 @@ nginx:
# controller, then potentially 1 from controller back to
# passenger). Each connection consumes a file descriptor.
# That's how we get these calculations
- worker_rlimit_nofile: {{ max_reqs * 3 + 1 + max_tunnels }}
+ # (we're multiplying by 5 instead to be on the safe side)
+ worker_rlimit_nofile: {{ (max_reqs + max_tunnels) * 5 + 1 }}
events:
- worker_connections: {{ max_reqs * 3 + 1 + max_tunnels }}
+ worker_connections: {{ (max_reqs + max_tunnels) * 5 + 1 }}
### SITES
servers:
Updated by Lucas Di Pentima 10 months ago
- % Done changed from 50 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|01ca27ba0a1ef84c53e223004249505435a788b6.