Bug #20606
Updated by Tom Clegg over 1 year ago
To reproduce: * Wait for a time when there are no preemptible instances available in the size needed by your workflow * Start workflow with preemptible instances enabled * Wait for the workflow to create a lot of child container requests * Cancel the workflow (this might be optional: the workflow might cancel by itself when a child container reaches maximum lock/unlock cycles) * Wait for all child containers to get cancelled * Restart the workflow with preemptible instances disabled * Wait for the workflow to create child container requests * The child container requests reuse the existing (queued, priority 0) containers with preemptible:true, and the preemptible instances still aren't available, so they continue to fail after a few lock/unlock cycles In #19917 we ensured a new container would be scheduled with preemptible:false in this case. However, that doesn't help at all if the container requests aren't going to auto-retry because container_count:1. Ideally, if we have a pair of requests (preemptible:true and preemptible:false) and a queued container with preemptible:true, it is probably better to start a container with preemptible:false, and use it for both reqs, although I think this will be inconvenient to implement. The easier situation, which we encountered today, has a container that that's matches reuse criteria but has preemptible:true _and_ and isn't about to run because it has priority 0 (i.e., whatever CR requested it has since been cancelled/failed). In this situation we should create have created a new container with preemptible:false instead of using the existing one. Another easy change: when a request has preemptible:false, don't reuse a container with preemptible:true that is in Queued or Locked state, even if it has priority>0, because there's a relatively high likelihood it will fail (especially considering the common pattern of "start non-preemptible workflow because preemptible workflow is not getting anywhere"). I think it's OK if this is wasteful in the (less common?) case of a race while cases where the preemptible containers are running well.