Project

General

Profile

Bug #21260

Updated by Peter Amstutz 5 months ago

Relevant notes on #21160 

 https://dev.arvados.org/issues/21160#note-8 

 https://dev.arvados.org/issues/21160#note-10 

 Summary: controller enforces request timeout using a context (supposed to be API.RequestTimeout that defaults to 5 minutes but I am seeing the controller context expire after 1 minute -- might also be a bug?) 

 However, Rails / Postgres don't get any signal to stop processing.    As a result the request continues processing (despite being cut loose by controller). 

 When controller cancels the session, the client gets 500 Internal Server Error.    This is treated as a retryable response. 

 As a result, the client retries the expensive request _which is still running_, and the retry takes up a second request handler slot. 

 This can cascade with the retry timing out, blocked by the first request (if there are locks involved) resulting in another retry which ties up a third request handler slot, and so on. 

 To make the system more stable, we should have a mechanism that terminates long-running requests in Rails when they exceed a certain runtime and/or the client hangs up. 

Back