Project

General

Profile

Bug #21287

Updated by Peter Amstutz 4 months ago

Originally from: 

 https://dev.arvados.org/issues/21285#note-2 

 In order to service a request, controller can do a number of things: 

 # Forward it to the local Rails API server 
 # Handle it entirely within controller (by querying the local database itself) 
 # Query another service (keep-web, or a crunch-run process on a compute node) 
 # Query another Arvados instance (federated queries) 

 In the 3rd or 4th cases, we don't have full control over what the other service is going to do -- but we have existing patterns in the keep-web and federated cases where the remote service will query back to our controller in order to verify an API token, retrieve a user record, or get other data. 

 We've specifically observed this with keep-web, where: 

 # the Workbench 2 process page sends requests for all the log collection files at once 
 # this hits controller's request limit  
 # keep-web sends a request back to verify a token 
 # the request to verify the token is stuck behind the outstanding requests that were proxied to keep-web, that are waiting on keep-web, that is waiting on the token verify 
 # the system is deadlocked until something times out 

 The current fix is to make sure the minimum request limit is high enough that we don't do this to ourselves. 

 We could get into a similar situation with federation, but an even simpler problem is one where the remote service is in a slow or broken (or malicious state) where it is a tar pit that causes queries to hang for a long time.    If the queue is filled with outstanding requests, the system will become unusable.    (Of course, this is also possible with slow Rails/database requests, but the sysadmin has more control over those). 

 h2. Proposed solution 

 Limit both incoming and outgoing requests. 

 * determine request priority and timestamp for priority queue order 
 * start handling incoming requests in priority order, with throttling 
 * when I propose a request handler config limit MaxProxiedRequests (name is going to make an outgoing request to Rails, acquire another throttled lock up for discussion) that limits the number of category of outgoing request 
 ** the request acquires the lock 3 or 4 requests such that requests in priority order category 1 or 2 can still be processed. 

 Exact implementation TBD. 

Back