Bug #21285: Add MaxGatewayTunnels config, separate from MaxConcurrentRequests - Arvados

Custom queries

All assigned issues
All issues assigned for grooming
My issues for grooming
My issues for grooming (no story pts)
Prioritized open issues

Actions

Copy link

Bug #21285

closed

Add MaxGatewayTunnels config, separate from MaxConcurrentRequests

Added by Tom Clegg over 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Lucas Di Pentima

Category:

API

Target version:

Development 2024-01-17 sprint

Story points:

2.0

Release:

Arvados 2.7.2

Release relationship:

Auto

Description

Currently N running containers will start N gateway tunnels that occupy N of the MaxConcurrentRequests slots, even though they don't use the resources that MaxConcurrentRequests is meant to protect (mainly RailsAPI and PostgreSQL). Since each one stays open for the entire duration of the respective container, these tunnel connections can easily consume most/all of the MaxConcurrentRequests slots, leaving none for workbench2 or even other API calls from the containers themselves.

To address this, we should add a separate MaxGatewayTunnels config. Incoming gateway_tunnel requests should not occupy MaxConcurrentRequests slots. After reaching the MaxGatewayTunnels limit, additional gateway_tunnel requests should return 503 immediately rather than wait in a queue. Crunch-run should delay and retry when this happens.

Nginx and load balancers will be expected to allow {MaxConcurrentRequests + MaxQueuedRequests + MaxGatewayTunnels} concurrent requests. Documentation and installer should be updated accordingly.

Files

queue-names.png (28.7 KB) queue-names.png

Lucas Di Pentima, 01/09/2024 10:06 PM

Subtasks 2 (0 open — 2 closed)

	Task #21325: Review 21285-max-gw-tunnels	Resolved	Tom Clegg	01/01/2024			Actions
	Task #21348: Update salt installer (21285-installer-updates)	Resolved	Lucas Di Pentima	01/01/2024			Actions

Related issues 1 (1 open — 0 closed)

Related to Arvados - Bug #21319: Avoid waiting/deadlock when a controller handler performs subrequests against the same controller

New

Actions

Issue # Delay: days Cancel

History
Notes
Property changes
Associated revisions

Actions

Copy link

Updated by Peter Amstutz over 1 year ago

On the topic of tying up request slots --

Requests that rely on external services (e.g. proxying to keep-web, or federation) other than the API server and/or local database should also be tracked and limited separately. There is a known pattern where certain requests to go out to the other service, and then that service contacts the API server back to verify the token, fetch user info, fetch a collection manifest, etc. This can result in deadlock if all the request slots are tied up waiting on external services.

Actions

Copy link

Updated by Tom Clegg over 1 year ago

Related to Bug #21319: Avoid waiting/deadlock when a controller handler performs subrequests against the same controller added

Actions

Copy link

Updated by Tom Clegg over 1 year ago

Assigned To set to Tom Clegg
Status changed from New to In Progress

21285-max-gw-tunnels @ a5af8485c9658e13c6ef40981138731b0db35a68 -- developer-run-tests: #3978
retry wb2 developer-run-tests-services-workbench2: #118

I reworked httpserver.RequestLimiter to have the caller create a number of RequestQueue objects, and provide a function to assign each incoming request to one of them. Each RequestQueue specifies its own MaxConcurrent, MaxQueue, and MaxQueueTimeForMinPriority, as well as a label used for reporting in metrics.

This means we now have a set of metrics for each queue ("api" and "tunnel"), which can be graphed separately and/or in aggregate.

-arvados_concurrent_requests 54
-arvados_max_concurrent_requests 54
-arvados_max_queued_requests 1
-arvados_queue_delay_seconds_count{priority="normal"} 54
-arvados_queue_timeout_seconds_count{priority="normal"} 0
-arvados_queued_requests{priority="normal"} 1
+arvados_concurrent_requests{queue="api"} 24
+arvados_concurrent_requests{queue="tunnel"} 30
+arvados_max_concurrent_requests{queue="api"} 24
+arvados_max_concurrent_requests{queue="tunnel"} 30
+arvados_max_queued_requests{queue="api"} 1
+arvados_max_queued_requests{queue="tunnel"} 0
+arvados_queue_delay_seconds_count{priority="normal",queue="api"} 24
+arvados_queue_delay_seconds_count{priority="normal",queue="tunnel"} 30
+arvados_queue_timeout_seconds_count{priority="normal",queue="api"} 0
+arvados_queue_timeout_seconds_count{priority="normal",queue="tunnel"} 0
+arvados_queued_requests{priority="normal",queue="api"} 1
+arvados_queued_requests{priority="normal",queue="tunnel"} 0

The "tunnel" queue is used for container shell SSH sessions (client->controller->crunchrun) and container gateway tunnels (crunchrun->controller).

✅ All agreed upon points are implemented / addressed.
- New config is MaxGatewayTunnels
- ✅ Crunch-run should delay and retry if tunnel setup endpoint responds 503 (we already had this - crunch-run waits 5s and retries after any error/disconnection)
✅ Anything not implemented (discovered or discussed during work) has a follow-up story.
- Added #21319 for #note-2
✅ Code is tested and passing, both automated and manual, what manual testing was done is described
✅ Documentation has been updated.
✅ Behaves appropriately at the intended scale (describe intended scale).
✅ Considered backwards and forwards compatibility issues between client and server.
✅ Follows our coding standards and GUI style guidelines.

Notes:

☐ Need to update Nginx config in salt installer?

Actions

Copy link

Updated by Lucas Di Pentima about 1 year ago

If every running container requires a tunnel slot, I think that the MaxGatewayTunnels config would behave as an upper limit to the Containers.CloudVMs.MaxInstances setting. Or putting it in another way: in the cloud scenario both config knobs are kind of equivalent (without counting the container shell sessions). Is that a correct understanding?
- Should we check that MaxGatewayTunnels is at least equal to (or greater than) MaxInstances to avoid surprising behaviors?
- Do you think we could simplify configuration by setting MaxGatewayTunnels's default to something related to whatever MaxInstances has?
The docs update doesn't include any mention on a-d-c's section, is that on purpose? Or maybe this branch is only HPC related?
- If containers only use tunnel queue slots on HPC clusters, I think it would be convenient to also mention it on the config file.
For changes on the installer, I will need to wait for this branch to be merged, so that I can do some testing with dev packages.
- I think the grafana dashboard will also need some tweaking.

Actions

Copy link

Updated by Tom Clegg about 1 year ago

Lucas Di Pentima wrote in #note-5:

If every running container requires a tunnel slot, ...

Sorry, the note I put in the config file was not correct. The crunchrun-to-controller tunnel is not set up when using the cloud dispatcher, so this does only apply on HPC, where MaxInstances doesn't apply. I've updated the note in the config file:

      # Maximum number of active gateway tunnel connections. One slot                                                                                                                        
      # is consumed by each "container shell" connection. If using an                                                                                                                        
      # HPC dispatcher (LSF or Slurm), one slot is consumed by each                                                                                                                          
      # running container.  These do not count toward                                                                                                                                        
      # MaxConcurrentRequests.                                                                                                                                                               
      MaxGatewayTunnels: 1000

For changes on the installer, I will need to wait for this branch to be merged, so that I can do some testing with dev packages.

I think the grafana dashboard will also need some tweaking.

SGTM

21285-max-gw-tunnels @ f0908fd5878be533d1208c6a528459945524256b

Actions

Copy link

Updated by Lucas Di Pentima about 1 year ago

Tom Clegg wrote in #note-6:

Sorry, the note I put in the config file was not correct. The crunchrun-to-controller tunnel is not set up when using the cloud dispatcher, so this does only apply on HPC, where MaxInstances doesn't apply. I've updated the note in the config file:
21285-max-gw-tunnels @ f0908fd5878be533d1208c6a528459945524256b

Thanks for the clarification, with that I think 21285-max-gw-tunnels LGTM.

Actions

Copy link

Updated by Tom Clegg about 1 year ago

Assigned To changed from Tom Clegg to Lucas Di Pentima

Merged 21285-max-gw-tunnels.

Leaving ticket open for installer changes.

Actions

Copy link

Updated by Lucas Di Pentima about 1 year ago

Updates at ce3c4f8 - branch 21285-installer-updates

Adds a new optional CONTROLLER_MAX_GATEWAY_TUNNELS variable in local.params's performance tuning section.
Adds API.MaxGatewayTunnels config with default value set to 1000.
Updates balancer & controller nginx configs to account for this new value.

Manually tested on our AWS sandbox account.

Pending: Grafana dashboard update.

Actions

Copy link

#10

Updated by Lucas Di Pentima about 1 year ago

File queue-names.png queue-names.png added

Update at 9f6c2250f6

Adds _{{queue}} to the labels on the concurrent request graph so that new queues get identified on demand.

Actions

Copy link

#11

Updated by Tom Clegg about 1 year ago

My only question is about the rlimit and worker_connections calculations:

      worker_rlimit_nofile: {{ (max_reqs * 3 + max_tunnels) * controller_nr }}

      worker_rlimit_nofile: {{ max_reqs * 3 + 1 + max_tunnels }}

I think we want to add max_tunnels*3 to nofile here: one fd for the incoming connection + one for the outgoing connection to controller + one from controller back to passenger for a permission check.

Also, questioning the existing logic: does nginx really only use one fd for an incoming connection from controller to passenger/rails? Or does it use one for controller->vhost + one for vhost->upstream? Just wondering whether we should be multiplying by 4 instead of 3 here. AFAIK there is no real reason to be stingy with file descriptors once we're going to the trouble to increase the limit...?

Actions

Copy link

#12

Updated by Lucas Di Pentima about 1 year ago

After discussing on chat, I've applied the following diff at ba1937c21e

diff --git a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls
index 25027b8571..485cf9c12f 100644
--- a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls
+++ b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_balancer_configuration.sls
@@ -16,9 +16,9 @@ nginx:
   ### SERVER
   server:
     config:
-      worker_rlimit_nofile: {{ (max_reqs * 3 + max_tunnels) * controller_nr }}
+      worker_rlimit_nofile: {{ (max_reqs + max_tunnels) * 5 * controller_nr }}
       events:
-        worker_connections: {{ (max_reqs * 3 + max_tunnels) * controller_nr }}
+        worker_connections: {{ (max_reqs + max_tunnels) * 5 * controller_nr }}
       ### STREAMS
       http:
         'geo $external_client':
diff --git a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls
index 0655a0db1f..0c9ef1c36e 100644
--- a/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls
+++ b/tools/salt-install/config_examples/multi_host/aws/pillars/nginx_passenger.sls
@@ -55,9 +55,10 @@ nginx:
       # controller, then potentially 1 from controller back to
       # passenger).  Each connection consumes a file descriptor.
       # That's how we get these calculations
-      worker_rlimit_nofile: {{ max_reqs * 3 + 1 + max_tunnels }}
+      # (we're multiplying by 5 instead to be on the safe side)
+      worker_rlimit_nofile: {{ (max_reqs + max_tunnels) * 5 + 1 }}
       events:
-        worker_connections: {{ max_reqs * 3 + 1 + max_tunnels }}
+        worker_connections: {{ (max_reqs + max_tunnels) * 5 + 1 }}

   ### SITES
   servers: