Project

General

Profile

Actions

Support #20680

closed

installer and defaults changes

Added by Peter Amstutz 11 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Target version:
Due date:
Story points:
-
Release relationship:
Auto

Description

The

arvados config:

    API:
      MaxConcurrentRequests: {{ ("__CONTROLLER_NGINX_WORKERS__" or grains['num_cpus'])|int * 2 }}
      MaxQueuedRequests: __CONTROLLER_MAX_CONCURRENT_REQUESTS__

nginx config:

  passenger:
    passenger_ruby: {{ passenger_ruby }}
    passenger_max_pool_size: {{ "__CONTROLLER_NGINX_WORKERS__" or grains['num_cpus'] }}
    {%- if max_reqs != "" %}
    # Default is 100 -- Configuring this a bit higher than API.MaxConcurrentRequests
    # to be able to handle /metrics requests even on heavy load situations.
    passenger_max_request_queue_size: {{ ("__CONTROLLER_NGINX_WORKERS__" or grains['num_cpus'])|int * 2 + 1 }}
    {%- endif %}

We should rename NGINX_WORKERS to something like CONTROLLER_WORKERS or CONTROLLER_REQUEST_HANDLERS

We should rename CONTROLLER_MAX_CONCURRENT_REQUESTS in local.params CONTROLLER_MAX_QUEUED_REQUESTS

We should also modify some defaults:

      # Maximum number of concurrent requests to process concurrently
      # in a single service process, or 0 for no limit.
      MaxConcurrentRequests: 4

      # Maximum number of incoming requests to hold in a priority
      # queue waiting for one of the MaxConcurrentRequests slots to be
      # free. When the queue is longer than this, respond 503 to the
      # lowest priority request.
      #
      # If MaxQueuedRequests is 0, respond 503 immediately to
      # additional requests while at the MaxConcurrentRequests limit.
      MaxQueuedRequests: 128

      # Number of times a container can be unlocked before being
      # automatically cancelled.
      MaxDispatchAttempts: 20

Subtasks 3 (0 open3 closed)

Task #20681: Review 20680-default-config-updates ResolvedPeter Amstutz08/04/2023Actions
Task #20833: Review 20680-rolling-deployResolvedPeter Amstutz08/04/2023Actions
Task #20854: Review 20680-cores-concurrencyResolvedPeter Amstutz08/04/2023Actions

Related issues

Related to Arvados - Bug #21124: Separate request limits for API server and KeepstoreResolvedTom Clegg10/27/2023Actions
Actions #1

Updated by Peter Amstutz 11 months ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz 10 months ago

  • Status changed from New to In Progress
Actions #3

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2023-07-05 sprint to Development 2023-07-19 sprint
Actions #4

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2023-07-19 sprint to Development 2023-08-02 sprint
Actions #5

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2023-08-02 sprint to Development 2023-08-16
Actions #6

Updated by Peter Amstutz 9 months ago

20680-default-config-updates @ e6fd711d2ddd776bc6a9171a37eab13a1a2e4584

We now have separate values in the Arvados config for concurrent requests and queued
requests, and setting them correctly is very important or performance.

Ensures the arvados, passenger and nginx configurations align
with the correct values.

Renamed CONTROLLER_NGINX_WORKERS to CONTROLLER_MAX_WORKERS

Renamed CONTROLLER_MAX_CONCURRENT_REQUESTS to
CONTROLLER_MAX_QUEUED_REQUESTS

Adjusted config.yml defaults to reflect that MaxConcurrentRequests
means something different.

Actions #7

Updated by Peter Amstutz 9 months ago

20680-default-config-updates @ d32c7034ae35872f7dda683b46bdddff17d8b2cd

Rebased this on to 20610-installer-load-balancer to preemptively handle merge conflicts.

Actions #8

Updated by Lucas Di Pentima 9 months ago

Just a couple of comments:

  • There's an CONTROLLER_MAX_CONCURRENT_REQUESTS use on multi_host/aws/pillars/nginx_balancer_configuration.sls that needs adjusting.
  • At local.params.example.multiple_hosts:L142 Maybe it'll be useful to add a comment on how our defaults work, for those users that won't know where to look for them.

Also, a question: Is the defaults difference for CONTROLLER_MAX_QUEUED_REQUESTS (128 vs 1024) on purpose?

With that, I think it would be good to merge.

Actions #9

Updated by Peter Amstutz 9 months ago

  • Status changed from In Progress to Resolved
Actions #10

Updated by Peter Amstutz 9 months ago

  • Status changed from Resolved to In Progress

20680-rolling-deploy @ 2c583f85220cd6d4e2aabb0ad2753cac6f9065a6

When there is a load balancer, disable each controller node in turn
before updating it.

Also, don't reduce controller_nr when a controller is down, having
worker_connections go up and down during deploy seems like in could
cause problems.

Actions #11

Updated by Lucas Di Pentima 9 months ago

Just a couple comments:

  • There's the edge case when there's just one balanced controller. In that case, disabling it from the balancer wouldn't be useful.
  • In the installer.sh script, the conditional & for-loop code are in the same indentation level, making it difficult to follow.
Actions #12

Updated by Peter Amstutz 9 months ago

Lucas Di Pentima wrote in #note-11:

Just a couple comments:

  • There's the edge case when there's just one balanced controller. In that case, disabling it from the balancer wouldn't be useful.

Fixed.

  • In the installer.sh script, the conditional & for-loop code are in the same indentation level, making it difficult to follow.

I double-checked and the indentation seems correct, can you show me what you think is wrong?

20680-rolling-deploy @ 810f82cb348d83fd09492384a9b31b31a5e3c9e9

Actions #13

Updated by Lucas Di Pentima 9 months ago

This LGTM, thanks.

Actions #14

Updated by Peter Amstutz 9 months ago

  • Status changed from In Progress to Resolved
Actions #15

Updated by Peter Amstutz 9 months ago

  • Status changed from Resolved to Feedback

From some additional experience, the number of workers should be 2x the number of cores, with a minimum of 8, I think. Need to update this.

Actions #16

Updated by Lucas Di Pentima 9 months ago

20680-cores-concurrency LGTM, thanks.

Actions #17

Updated by Peter Amstutz 9 months ago

  • Status changed from Feedback to Resolved
Actions #18

Updated by Peter Amstutz 9 months ago

  • Release set to 66
Actions #19

Updated by Tom Clegg 6 months ago

  • Related to Bug #21124: Separate request limits for API server and Keepstore added
Actions

Also available in: Atom PDF