Project

General

Profile

Idea #7478

Updated by Lucas Di Pentima almost 6 years ago

Functional requirements: 

 * Requests spot instances, waits for those requests to be fulfilled (minutes?) and launches the instances as compute nodes. 
 * For the initial implementation, just bid the standard price rather than trying to design a fancy bidding strategy. We'll still get the cost benefit as long as the spot price is lower. 
 * When the bid price is exceeded (hopefully rarely/never), we're likely to lose our entire fleet of compute instances and, perhaps, not be able to start any until demand subsides enough to cause the spot prices to go down. In this scenario, we'll need some configuration knobs to control whether to fall back to on-demand instances, wait for spot instances to become available again, etc. 

 Implementation details: 

 * Enhance libcloud to support AWS spot instances. (Done) 
 * API server will have a config option which specifies whether spot instances are enabled or not. If they are enabled, child containers will get created with the spot instances scheduling parameter set. 
 * Spot instances will be their own instance type. Node manager needs to manage instance types separately from the libcloud-specified instance type that it currently does. Node manager will use the new libcloud support to request spot instances when needed. No arvados-cwl-runner required. 
 * Nodemanager spot instance handling: 
 ** @[Size <name>]@ sections on the config use instance types as <name>: decouple that and add it as instance_type attribute inside the section leaving <name> for description purposes only 
 ** Each size section will have a boolean “preemptable” attribute, defaulting to False. 
 ** Update ServerCalculator & related code so that the instance type is not the unique id of a "nodesize" 
 ** Update ec2 driver to pass the the @ex_spot_marke=True@ parameter on the libcloud create_node call 
 * Update documentation explaining nodemanager config file format changes

Back