Project

General

Profile

Actions

Bug #21287

open

Binning and throttling incoming and outgoing requests

Added by Peter Amstutz 11 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
API
Target version:
Story points:
-

Description

Originally from:

https://dev.arvados.org/issues/21285#note-2

In order to service a request, controller can do a number of things:

  1. Forward it to the local Rails API server
  2. Handle it entirely within controller (by querying the local database itself)
  3. Query another service (keep-web, or a crunch-run process on a compute node)
  4. Query another Arvados instance (federated queries)

In the 3rd or 4th cases, we don't have full control over what the other service is going to do -- but we have existing patterns in the keep-web and federated cases where the remote service will query back to our controller in order to verify an API token, retrieve a user record, or get other data.

We've specifically observed this with keep-web, where:

  1. the Workbench 2 process page sends requests for all the log collection files at once
  2. this hits controller's request limit
  3. keep-web sends a request back to verify a token
  4. the request to verify the token is stuck behind the outstanding requests that were proxied to keep-web, that are waiting on keep-web, that is waiting on the token verify
  5. the system is deadlocked until something times out

The current fix is to make sure the minimum request limit is high enough that we don't do this to ourselves.

We could get into a similar situation with federation, but an even simpler problem is one where the remote service is in a slow or broken (or malicious state) where it is a tar pit that causes queries to hang for a long time. If the queue is filled with outstanding requests, the system will become unusable. (Of course, this is also possible with slow Rails/database requests, but the sysadmin has more control over those).

Proposed solution

Limit both incoming and outgoing requests.

  • determine request priority and timestamp for priority queue order
  • start handling up to MaxConcurrentRequests incoming requests in priority order, with throttling
  • when a request handler is going to make an outgoing request to Rails, acquire another throttled lock (up to MaxConcurrentRailsRequests) for that category of outgoing request
    • the request acquires the rails lock in priority order
  • also want to bin requests into categories, eg
    • requests that get information about the token, e.g current user or current token
    • requests that proxy to keep-web
    • container gateway requests (already implemented)
    • everything else
Actions #1

Updated by Peter Amstutz 11 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 11 months ago

  • Category set to API
Actions #3

Updated by Peter Amstutz 11 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 11 months ago

  • Status changed from In Progress to New
Actions #5

Updated by Peter Amstutz 11 months ago

  • Subject changed from MaxExternalRequests config to MaxForwardedRequests config
Actions #6

Updated by Peter Amstutz 11 months ago

  • Subject changed from MaxForwardedRequests config to MaxProxiedRequests config
Actions #7

Updated by Peter Amstutz 11 months ago

  • Description updated (diff)
  • Subject changed from MaxProxiedRequests config to MaxProxiedRequests config
Actions #8

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Actions #9

Updated by Peter Amstutz 10 months ago

  • Description updated (diff)
  • Subject changed from MaxProxiedRequests config to Throttle both incoming and outgoing requests
Actions #10

Updated by Peter Amstutz 10 months ago

  • Description updated (diff)
Actions #11

Updated by Peter Amstutz 10 months ago

  • Subject changed from Throttle both incoming and outgoing requests to Binning and throttling incoming and outgoing requests
Actions #12

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Actions #13

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Actions #14

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Actions #15

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Actions #16

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-24 sprint
Actions #17

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-04-10 sprint
Actions #18

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Actions #19

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #20

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-06-05 sprint
Actions #21

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-06-05 sprint to 439
Actions #22

Updated by Peter Amstutz 6 months ago

  • Target version changed from 439 to Development 2024-07-03 sprint
Actions #23

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-07-03 sprint to Development 2024-08-07 sprint
Actions #24

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-08-07 sprint to Future
Actions

Also available in: Atom PDF