This is going to take a little more rearchitecting of Node Manager than I originally thought.
Imagine the case where there's one job in the queue, and it wants N nodes, where N > M, the number of nodes allowed during the throttle period. There is no point in throttling the creation of these nodes, because otherwise you'll just have M nodes sitting idle until the throttle re-opens. So the throttle will have to accept atomic requests: when it's open, the daemon can say, "Okay, I'm starting N nodes," and they'll all count toward the throttle, even if just one of them would be sufficient to close it.
But in order to make atomic requests, ServerCalculator will need to send more information to the daemon. Right now it's just a flat list of node sizes, so the daemon has no way of knowing which nodes are for which job(s). This should probably be changed to a list of 2-tuples of (size, count). We'll probably also ditch the _nodes_wanted() method of the daemon, instead replacing it with something like _should_start_nodes(size, count) that checks against max_nodes, the throttle, and the number of idle nodes, and returns a boolean to say go or no go.
I'm imagining that we'll have a ServerThrottle class with just one method, add, that takes the number of servers to be created. The method is atomic: if the request is allowed, it adds the servers to the throttle tally, and returns True to indicate that the request is allowed. Otherwise, it returns False. Internally, it'll keep track of the tally with a collections.deque of node request times that it poplefts() as needed at the start of every add() while que and (que[0] < time.time())
.