Bug #22454
Updated by Peter Amstutz 27 days ago
We have code in crunch-run that is supposed to detect if a spot instance is going to be reclaimed.
I don't think I have _ever_ seen a spot instance termination notice actually appear on a user cluster, which makes me suspicious that our current implementation doesn't actually work.
I suggest that we test this ourselves:
Create a container on a spot instance (cheapest possible) that just sleeps forever. Eventually it will be reclaimed (may take up to 24 hours, though!)
For reference, here's the AWS page I found about it:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html
Related, my suspicion is that reclaimed instances show up as cancelled containers that only log up to "Creating Docker container" -- this is the point that saveLogCollection() gets called. It would be helpful if crunch-run called saveLogCollection() again right before Wait(), so that we know that the container actually started.