Project

General

Profile

Bug #20451

Updated by Peter Amstutz 12 months ago

Process reported 503 error from UpdateContainerFinal, crunch-run reported exited but dispatcher still thinks it is running and won't shut down the node. 

 Another process reported "error updating exit_code: 503 error" and later "error in CaptureOutput: error retrieving collection record: 503 error" 

 Also "error saving log collection: error recording logs: %!q(<nil>), "503 error". 

 Have also seen containers that reported as "Complete" in the live log but don't have the complete log in the collection. 

 Maybe the OOM killer is getting to crunch-run sometimes?    Sometimes it just stops logging entirely, and there's nothing useful in the logs stored in keep, either.    Or maybe failing to commit the logs to keep causes the logging system to seize up? 

 I did see "error updating container log: 503" on another process. 

 Got "docker watchdog: error inspecting container: context deadline exceeded" and "container exited with status code 0" but it is still shown as running. 

 Some of the steps get through most (all?) the "Copying" lines and then seize up. 

 For some reason many (although not all) of the steps that froze up are running "STAR"  

 arv-mount exception during setup can result in as stuck mount, maybe not handled properly? 

Back