Story #12383

[Nodemanager] Explicit node record states

Added by Peter Amstutz 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Proposed node record states

  • Requested - create request for node size X will be sent
  • Assigned - create request returned a cloud node id, waiting to pair
  • Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work)
    • don't record every idle<-->busy transition (slurm is the source of truth here)
    • from here the transition table in nodemanager decides when to go into drain/shutdown state. this is based on:
      • node status: busy/down/idle/unpaired
      • shutdown windown open/closed (AWS billing optimization, could be removed)
      • boot wait or boot exceeded
      • idle wait or idle exceeded (how long to wait for more work, currently not implemented)
  • Draining - will set "drain" state in SLURM, wait for work to complete
  • Shutdown - shutdown request will be sent
  • Gone - corresponding cloud node is no longer present in the cloud nodes table, record can be safely deleted.

Changes:

API server gets explicit "state", "node_size" "cloud_node" columns.

Node manager determines next action based on state in nodes table, and is responsive to external changes to state. To create a new node, create a node record in "Requested" state. To shutdown a node, set its state to "Drain" or "Shutdown".

Wishlist items are fufilled by creating a new node record in "Request" state.

Associated revisions

Revision dc060ea2 (diff)
Added by Nico César 2 months ago

initial diagrams to discuss

refs #12383

History

#1 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

#2 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

#3 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

#4 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

Also available in: Atom PDF