Project

General

Profile

Bug #21260

Updated by Peter Amstutz about 1 year ago

Relevant notes on #21160 

 https://dev.arvados.org/issues/21160#note-8 

 https://dev.arvados.org/issues/21160#note-10 

 Summary: controller enforces request timeout using a context (supposed to be API.RequestTimeout that defaults to 5 minutes but I am seeing the controller context expire after 1 minute -- might also be a bug?) 

 However, Rails / Postgres don't get any signal to stop processing.    As a result the request continues processing (despite being cut loose by controller). 

 When controller cancels the session, the client gets 500 Internal Server Error.    This is treated as a retryable response. 

 As a result, the client retries the expensive request _which is still running_, and the retry takes up a second request handler slot. 

 This can cascade with the retry timing out, blocked by the first request (if there are locks involved) resulting in another retry which ties up a third request handler slot, and so on. 

 To make the system more stable, we should have a mechanism that terminates long-running requests in Rails when they exceed a certain runtime and/or the client hangs up. 

 We might want to use this: 

 https://github.com/ankane/slowpoke 

 This specifically supports passenger and tells passenger to abandon the Ruby process on timeout (which is fine, because we use passenger in forked multiprocess mode since threaded mode is "enterprise only"). 

Back