Project

General

Profile

Bug #20200

Updated by Peter Amstutz about 1 year ago

Explicitly limit Live logging from crunch-run has the following behavior: 

 * There is no automatic retry (because arvadosclient does not retry POST) 
 * Regardless of success or failure, the log create requests in controller buffer is discarded, so they do some logging gets lost 
 * It does not overwhelm respond to 503 errors by slowing down its own logging rate, but we know from experience that excessive logging is the main cause of the API server, we want to leave capacity for all server getting overwhelmed. 

 Proposed fixes: 

 * On 500 error, do not discard the other requests. log buffer 
 * On 500 error, increase crunchLogSecondsBetweenEvents by +2 seconds, or multiply by 1.5 or 2 
 * Remove the check on crunchLogBytesPerEvent so that it doesn't exceed the logging interval 

Back