Project

General

Profile

Actions

Idea #4299

closed

[Node Manager] Support a configurable throttle for creating cloud nodes

Added by Brett Smith about 10 years ago. Updated about 8 years ago.

Status:
Rejected
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
-
Start date:
10/23/2014
Due date:
Story points:
1.0

Description

You should be able to specify that the daemon should not create more than N cloud nodes in T duration of time. The daemon should decline to create new cloud nodes while we're at this limit.


Related issues 2 (1 open1 closed)

Related to Arvados - Feature #5712: Smarter node allocation for small jobsClosed04/13/2015Actions
Related to Arvados - Feature #5768: [Node Manager] Add max_node_per_user configNewActions
Actions #1

Updated by Brett Smith about 10 years ago

  • Tracker changed from Bug to Idea
Actions #2

Updated by Ward Vandewege about 10 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints
Actions #3

Updated by Ward Vandewege about 10 years ago

  • Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Actions #4

Updated by Brett Smith about 10 years ago

  • Assigned To set to Brett Smith
Actions #5

Updated by Brett Smith about 10 years ago

This is going to take a little more rearchitecting of Node Manager than I originally thought.

Imagine the case where there's one job in the queue, and it wants N nodes, where N > M, the number of nodes allowed during the throttle period. There is no point in throttling the creation of these nodes, because otherwise you'll just have M nodes sitting idle until the throttle re-opens. So the throttle will have to accept atomic requests: when it's open, the daemon can say, "Okay, I'm starting N nodes," and they'll all count toward the throttle, even if just one of them would be sufficient to close it.

But in order to make atomic requests, ServerCalculator will need to send more information to the daemon. Right now it's just a flat list of node sizes, so the daemon has no way of knowing which nodes are for which job(s). This should probably be changed to a list of 2-tuples of (size, count). We'll probably also ditch the _nodes_wanted() method of the daemon, instead replacing it with something like _should_start_nodes(size, count) that checks against max_nodes, the throttle, and the number of idle nodes, and returns a boolean to say go or no go.

I'm imagining that we'll have a ServerThrottle class with just one method, add, that takes the number of servers to be created. The method is atomic: if the request is allowed, it adds the servers to the throttle tally, and returns True to indicate that the request is allowed. Otherwise, it returns False. Internally, it'll keep track of the tally with a collections.deque of node request times that it poplefts() as needed at the start of every add() while que and (que[0] < time.time()).

Actions #6

Updated by Brett Smith about 10 years ago

  • Target version changed from 2014-11-19 sprint to 2014-12-10 sprint
Actions #7

Updated by Tom Clegg about 10 years ago

  • Target version changed from 2014-12-10 sprint to Arvados Future Sprints
Actions #8

Updated by Tom Morris about 8 years ago

  • Assigned To changed from Brett Smith to Tom Morris
Actions #9

Updated by Tom Morris about 8 years ago

  • Status changed from New to Rejected
  • Assigned To deleted (Tom Morris)
  • Target version deleted (Arvados Future Sprints)

There may be pathological conditions but we should deal with this some other way.

Actions

Also available in: Atom PDF