Project

General

Profile

Actions

Idea #6313

closed

[Node Manager] Booting nodes shouldn't satisfy min_nodes

Added by Brett Smith almost 9 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
-
Story points:
1.0

Description

The scenario

  • Imagine Node Manager is configured with min_nodes = 1, running on a cluster where a compute node is sitting idle.
  • Two jobs are submitted simultaneously.
  • Node Manager checks the queue, and starts booting a new node.
  • The two jobs both run very quickly on the idle node.
  • The next time Node Manager polls the job queue, it's empty.

Currently, in this situation, Node Manager will shut down the idle node. It wants to shut down something, because it's managing more nodes than it needs (2 > 1). It can't shut down the booting node, because its shutdown window isn't open yet. This is surprising to users, because from the Dashboard, the number of running nodes appears to drop below min_nodes: the booting node hasn't pinged Arvados yet.

Proposed fix

Node Manager should decline to shut down a node if doing so would cause the number of paired nodes to fall below min_nodes.

Possible extension: Node Manager should boot a node if fewer than min_nodes nodes are paired with Arvados, unless that would cause the number of cloud nodes to exceed max_nodes. Assume that fresh nodes will eventually pair with Arvados, they just haven't yet.

This is just one idea. Other solutions may be better.

Actions #1

Updated by Ward Vandewege over 3 years ago

  • Status changed from New to Closed
Actions #2

Updated by Ward Vandewege over 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF