Story #13051

Spike - Investigate/prototype AWS spot instance support in libcloud

Added by Tom Morris about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
04/18/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0
Release:
Release relationship:
Auto

Description

Based on some cursory investigation we'd like to pursue integration of AWS spot instance support into Apache libcloud rather than using boto3 to provide this.

Possible sources for inspiration / code include:


Subtasks

Task #13353: Review https://github.com/curoverse/libcloud/tree/ec2-spot-instancesResolvedLucas Di Pentima


Related issues

Blocks Arvados - Story #7478: [Node Manager] Creates compute nodes using AWS spot instancesResolved05/25/2018

History

#1 Updated by Tom Morris about 3 years ago

  • Parent task deleted (#7478)

#2 Updated by Tom Morris about 3 years ago

  • Tracker changed from Task to Story

#3 Updated by Tom Morris about 3 years ago

  • Blocks Story #7478: [Node Manager] Creates compute nodes using AWS spot instances added

#4 Updated by Tom Morris about 3 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints

#5 Updated by Tom Morris about 3 years ago

  • Target version changed from Arvados Future Sprints to 2018-04-25 Sprint

#6 Updated by Tom Morris about 3 years ago

  • Assigned To set to Lucas Di Pentima

#7 Updated by Lucas Di Pentima about 3 years ago

  • Status changed from New to In Progress

#8 Updated by Lucas Di Pentima about 3 years ago

Finally got limits upped on my AWS account so I was able to do some testing.

The updates are available at our libcloud fork: https://github.com/curoverse/libcloud/tree/ec2-spot-instances

There's an example script that creates a spot request for a t2.micro instance, asking to pay half the price, then wait for the request to be fulfilled, prints the instance id, and then stop everything.

The prototype I've based on (https://github.com/muccg/libcloud-drivers) was written as an EC2NodeDriver subclass, but I think that it's not necessary because a spot instance is just a normal instance that happens to cost less. The only difference is the way they are created, so just adding those methods to the EC2NodeDriver class will simplify our integration on nodemanager, in my opinion.

#9 Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2018-04-25 Sprint to 2018-05-09 Sprint

#10 Updated by Peter Amstutz almost 3 years ago

I think what we want is to make spot instance requests act as much like regular nodes requests as possible. This means Node manager can use the same create_node() and destroy_node() methods for spot instances and we don't have to re-architect node manager with a special code path for spot instances.

So, for example, create_node() would check for the keyword argument ex_spot_price and go down an alternate code path that creates a spot instance.

It looks like the most substantial API difference with request_spot_instances() is that create_node() returns a Node object, but request_spot_instances() returns an EC2SpotRequest. Maybe we could cram the extra fields in EC2SpotRequest into the extra field of the Node object.

Similarly, destroy_node() would check if a Node is actually a spot instance, and cancel the spot instance request instead of destroying the node directly.

What do you think?

#11 Updated by Lucas Di Pentima almost 3 years ago

Update at commit 4411158c - https://github.com/curoverse/libcloud/commit/4411158cc224b8b9e1482e26514a48d918a8b9ec

As suggested, refactored the code so that spot instances creation is the same as on-demand instances:

  • New class EC2SpotNode acts as a proxy by providing a placeholder Node object and providing a cached access to the spot instance request.
  • Implemented create_node on the EC2NodeDriver class to check if ex_spot_price argument is provided, make a spot request and return EC2SporNode object(s)
  • Implemented destroy_node on the EC2NodeDriver class so that before calling the superclass method, it checks if the request is actually fulfilled, if it isn't then just cancel the spot request.

Also updated the example script example_compute_ec2spot.py to be able to try the new code. If you want to test an unfulfilled spot request, you can pass ex_spot_price as 0.001 and re-run the script: after 3 tries it'll cancel the request.

Pending:
  • Update and create more tests
  • The reboot_node could be overridden also, although I'm not sure what should the behavior be when asking for a non-existant node to reboot. Maybe just raising an exception.

#12 Updated by Lucas Di Pentima almost 3 years ago

Ahem, I've just found that "recently" (last year), Amazon released a new way of launching Spot Instances, supposedly easier, from the RunInstances action:

https://aws.amazon.com/blogs/compute/new-amazon-ec2-spot-pricing/

...on the docs, this feature is buried under an awkward name: MarketOptions:

https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_InstanceMarketOptionsRequest.html

I don't know if this new way simplify things making spot requests accounting unnecessary, there's no clear explanation about how it behaves, I could make some tests to see if this option would return an instance id as always, and if that's the case, the needed changes would be minimal and a lot cleaner.

#13 Updated by Lucas Di Pentima almost 3 years ago

Made a new prototype branch: ec2-spot-market - https://github.com/curoverse/libcloud/commit/6ef07ca46ea0f928c426a746d23a578ba0a9dc54

Just adding one detail to the usual RunInstances call is enough to ask for a spot request with default price. The call returns the instance id immediately as with on-demand instances. I think this is the way to go.

#15 Updated by Peter Amstutz almost 3 years ago

Lucas Di Pentima wrote:

Update on the new branch: https://github.com/curoverse/libcloud/commit/5c1656ace02eaa50e89907a2ed085a2843d232a6

Added spot price support

One minor inconsistency, ex_spot_price is documented as being a string input, but your test case uses a floating point input "ex_spot_price=0.005"

:type       ex_spot_price: ``str``

Otherwise this LGTM. (Very happy that we don't need to abstract spot instances in libcloud, but the AWS API actually does it for us).

#16 Updated by Lucas Di Pentima almost 3 years ago

Fixed documentation, rebased & submitted PR: https://github.com/apache/libcloud/pull/1207

#17 Updated by Lucas Di Pentima almost 3 years ago

The PR 1207 is still pending. Standing by for news from the libcloud team.

#18 Updated by Lucas Di Pentima almost 3 years ago

  • Status changed from In Progress to Resolved

#19 Updated by Tom Morris over 2 years ago

  • Release set to 13

Also available in: Atom PDF