Project

General

Profile

Bug #22454

Updated by Peter Amstutz 27 days ago

We have code in crunch-run that is supposed to detect if a spot instance is going to be reclaimed. 

 I don't think I have _ever_ seen a spot instance termination notice actually appear on a user cluster, which makes me suspicious that our current implementation doesn't actually work. 

 I suggest that we test this ourselves: 

 Create a container on a spot instance (cheapest possible) that just sleeps forever.    Eventually it will be reclaimed (may take up to 24 hours, though!) 

 For reference, here's the AWS page I found about it: 

 https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html 

 Related, my suspicion is that reclaimed instances show up as cancelled containers that only log up to "Creating Docker container" -- this is the point that saveLogCollection() gets called.    It would be helpful if crunch-run called saveLogCollection() again right before Wait(), so that we know that the container actually started. 

Back