Bug #20614

Updated by Peter Amstutz 9 months ago

If container_count > 1 then Workbench 2 renders a message like "Warning: Process retried 1 time due to failure." 


 The problem is that the log collection doesn't seem to have any record of the 1st attempt. 

 We need to figure out why it is not including the 1st failure in the log collection (and then maybe fix what's actually failing). 


 Actually, the first failure was recorded (but it is unclear why it failed) but the new container was not started before it was cancelled, and thus never created any logs to be recorded, resulting in a confusing log collection that only shows one container (the old one). 

 We need a way to communicate this situation better. 


 Need to figure out why this processes are actually getting killed, the API server was at high load and maybe this is causing thing to time out and think the containers are abandoned somehow?