Project

General

Profile

Feature #8018

Updated by Peter Amstutz over 7 years ago

When a container goes into cancelled state, create a new duplicate container when these conditions are met by I think we want at least one three failure modes: 

 *Error* (there was an infrastructure error, the job should always retry) 

 *Invalid* (there's something invalid in the container request: record, most of the time this should be prevented by API server validation, but if not, some other component can mark the container as impossible to fulfill) 

 * request *Lost* (we've lost track of the container, it is possible it is still running somewhere and will complete, however we should go ahead retry it just in "committed" state (not uncommitted or finalized) 
 * request has priority > 0 
 * request has remaining retries (how to determine?) case) (this probably requires a heatbeat from containers) 

Back