Project

General

Profile

Feature #21460

Updated by Peter Amstutz 3 months ago

When a spot instance is reclaimed, we don't want to treat it as a instance failure and retry immediately. 

 Instead, we want to wait a little bit (should be configurable, maybe just add capacityErrorTTL to the config file?) before trying to acquire that instance type again. 

 When container a spot instance has indicated that it cancelled because its instance was reclaimed, reclaimed: 

 # the spot instance type it was running on should be marked as "at capacity". capacity" 
 # the time to wait before trying again should be configurable 

 If an attempt to allocate a spot instance fails with a "can't get spot instance" error we should also set "at capacity" state. 

Back