Project

General

Profile

Bug #20200

Updated by Peter Amstutz about 1 year ago

Live logging from crunch-run has the following behavior: 

 * There is no automatic retry (because arvadosclient does not retry POST) 
 * Regardless of success or failure, the log buffer is discarded, so some logging gets lost 
 * It does not respond to 503 errors by slowing down its own logging rate, but we know from experience that excessive logging is the main cause of the API server getting overwhelmed. 

 Proposed fixes: 

 * On 500 error, do not discard the log buffer 
 * On 500 error, increase crunchLogSecondsBetweenEvents by +2 seconds, or multiply by 1.5 or 2 
 * Remove the check on crunchLogBytesPerEvent so that it doesn't exceed the logging interval 

Back