Idea #13051
closedSpike - Investigate/prototype AWS spot instance support in libcloud
Description
Based on some cursory investigation we'd like to pursue integration of AWS spot instance support into Apache libcloud rather than using boto3 to provide this.
Possible sources for inspiration / code include:
- prototype forked from libcloud (4 years old, so could be bitrotted) - https://github.com/muccg/libcloud-drivers
- boto3 implementation - http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.Client.request_spot_instances
- AWS docsI - https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RequestSpotInstances.html & https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html)
Related issues
Updated by Tom Morris over 6 years ago
- Blocks Idea #7478: [Node Manager] Creates compute nodes using AWS spot instances added
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
Updated by Tom Morris over 6 years ago
- Target version changed from Arvados Future Sprints to 2018-04-25 Sprint
Updated by Lucas Di Pentima over 6 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima over 6 years ago
Finally got limits upped on my AWS account so I was able to do some testing.
The updates are available at our libcloud fork: https://github.com/curoverse/libcloud/tree/ec2-spot-instances
There's an example script that creates a spot request for a t2.micro
instance, asking to pay half the price, then wait for the request to be fulfilled, prints the instance id, and then stop everything.
The prototype I've based on (https://github.com/muccg/libcloud-drivers) was written as an EC2NodeDriver
subclass, but I think that it's not necessary because a spot instance is just a normal instance that happens to cost less. The only difference is the way they are created, so just adding those methods to the EC2NodeDriver
class will simplify our integration on nodemanager, in my opinion.
Updated by Peter Amstutz over 6 years ago
- Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint
Updated by Peter Amstutz over 6 years ago
I think what we want is to make spot instance requests act as much like regular nodes requests as possible. This means Node manager can use the same create_node() and destroy_node() methods for spot instances and we don't have to re-architect node manager with a special code path for spot instances.
So, for example, create_node() would check for the keyword argument ex_spot_price and go down an alternate code path that creates a spot instance.
It looks like the most substantial API difference with request_spot_instances() is that create_node() returns a Node
object, but request_spot_instances() returns an EC2SpotRequest
. Maybe we could cram the extra fields in EC2SpotRequest
into the extra
field of the Node object.
Similarly, destroy_node() would check if a Node is actually a spot instance, and cancel the spot instance request instead of destroying the node directly.
What do you think?
Updated by Lucas Di Pentima over 6 years ago
Update at commit 4411158c
- https://github.com/curoverse/libcloud/commit/4411158cc224b8b9e1482e26514a48d918a8b9ec
As suggested, refactored the code so that spot instances creation is the same as on-demand instances:
- New class
EC2SpotNode
acts as a proxy by providing a placeholder Node object and providing a cached access to the spot instance request. - Implemented
create_node
on theEC2NodeDriver
class to check ifex_spot_price
argument is provided, make a spot request and returnEC2SporNode
object(s) - Implemented
destroy_node
on theEC2NodeDriver
class so that before calling the superclass method, it checks if the request is actually fulfilled, if it isn't then just cancel the spot request.
Also updated the example script example_compute_ec2spot.py
to be able to try the new code. If you want to test an unfulfilled spot request, you can pass ex_spot_price
as 0.001 and re-run the script: after 3 tries it'll cancel the request.
- Update and create more tests
- The
reboot_node
could be overridden also, although I'm not sure what should the behavior be when asking for a non-existant node to reboot. Maybe just raising an exception.
Updated by Lucas Di Pentima over 6 years ago
Ahem, I've just found that "recently" (last year), Amazon released a new way of launching Spot Instances, supposedly easier, from the RunInstances
action:
https://aws.amazon.com/blogs/compute/new-amazon-ec2-spot-pricing/
...on the docs, this feature is buried under an awkward name: MarketOptions:
https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_InstanceMarketOptionsRequest.html
I don't know if this new way simplify things making spot requests accounting unnecessary, there's no clear explanation about how it behaves, I could make some tests to see if this option would return an instance id as always, and if that's the case, the needed changes would be minimal and a lot cleaner.
Updated by Lucas Di Pentima over 6 years ago
Made a new prototype branch: ec2-spot-market
- https://github.com/curoverse/libcloud/commit/6ef07ca46ea0f928c426a746d23a578ba0a9dc54
Just adding one detail to the usual RunInstances call is enough to ask for a spot request with default price. The call returns the instance id immediately as with on-demand instances. I think this is the way to go.
Updated by Lucas Di Pentima over 6 years ago
Update on the new branch: https://github.com/curoverse/libcloud/commit/5c1656ace02eaa50e89907a2ed085a2843d232a6
Added spot price support
Updated by Peter Amstutz over 6 years ago
Lucas Di Pentima wrote:
Update on the new branch: https://github.com/curoverse/libcloud/commit/5c1656ace02eaa50e89907a2ed085a2843d232a6
Added spot price support
One minor inconsistency, ex_spot_price is documented as being a string input, but your test case uses a floating point input "ex_spot_price=0.005"
:type ex_spot_price: ``str``
Otherwise this LGTM. (Very happy that we don't need to abstract spot instances in libcloud, but the AWS API actually does it for us).
Updated by Lucas Di Pentima over 6 years ago
Fixed documentation, rebased & submitted PR: https://github.com/apache/libcloud/pull/1207
Updated by Lucas Di Pentima over 6 years ago
The PR 1207 is still pending. Standing by for news from the libcloud team.
Updated by Lucas Di Pentima over 6 years ago
- Status changed from In Progress to Resolved