Project

General

Profile

Actions

Idea #12383

closed

[Nodemanager] Explicit node record states

Added by Peter Amstutz over 6 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
Story points:
-

Description

Proposed node record states

  • Requested - create request for node size X will be sent
  • Assigned - create request returned a cloud node id, waiting to pair
  • Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work)
    • don't record every idle<-->busy transition (slurm is the source of truth here)
    • from here the transition table in nodemanager decides when to go into drain/shutdown state. this is based on:
      • node status: busy/down/idle/unpaired
      • shutdown windown open/closed (AWS billing optimization, could be removed)
      • boot wait or boot exceeded
      • idle wait or idle exceeded (how long to wait for more work, currently not implemented)
  • Draining - will set "drain" state in SLURM, wait for work to complete
  • Shutdown - shutdown request will be sent
  • Gone - corresponding cloud node is no longer present in the cloud nodes table, record can be safely deleted.

Changes:

API server gets explicit "state", "node_size" "cloud_node" columns.

Node manager determines next action based on state in nodes table, and is responsive to external changes to state. To create a new node, create a node record in "Requested" state. To shutdown a node, set its state to "Drain" or "Shutdown".

Wishlist items are fufilled by creating a new node record in "Request" state.


Files

Actions #1

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz over 6 years ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz about 4 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF