Story #12383

[Nodemanager] Explicit node record states

Added by Peter Amstutz 17 days ago. Updated 17 days ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Story points-
Velocity based estimate-

Description

Proposed node record states

  • Requested - create request for node size X will be sent
  • Assigned - create request returned a cloud node id, waiting to pair
  • Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work)
    • don't record every idle<-->busy transition (slurm is the source of truth here)
    • from here the transition table in nodemanager decides when to go into drain/shutdown state. this is based on:
      • node status: busy/down/idle/unpaired
      • shutdown windown open/closed (AWS billing optimization, could be removed)
      • boot wait or boot exceeded
      • idle wait or idle exceeded (how long to wait for more work, currently not implemented)
  • Draining - will set "drain" state in SLURM, wait for work to complete
  • Shutdown - shutdown request will be sent
  • Gone - corresponding cloud node is no longer present in the cloud nodes table, record can be safely deleted.

Changes:

API server gets explicit "state", "node_size" "cloud_node" columns.

Node manager determines next action based on state in nodes table, and is responsive to external changes to state. To create a new node, create a node record in "Requested" state. To shutdown a node, set its state to "Drain" or "Shutdown".

Wishlist items are fufilled by creating a new node record in "Request" state.

compute-nodes-state-diagram-current.png (49.7 KB) Nico César, 10/02/2017 05:41 pm

compute-nodes-state-diagram-proposed.png (38.8 KB) Nico César, 10/02/2017 05:41 pm

Associated revisions

Revision dc060ea2
Added by Nico César 17 days ago

initial diagrams to discuss

refs #12383

History

#1 Updated by Peter Amstutz 17 days ago

  • Description updated (diff)

#2 Updated by Peter Amstutz 17 days ago

  • Description updated (diff)

#3 Updated by Peter Amstutz 17 days ago

  • Description updated (diff)

#4 Updated by Peter Amstutz 17 days ago

  • Description updated (diff)

#5 Updated by Nico César 17 days ago

checkout dc060ea2f05e3266562c449fff39b3e867041f84

we have .dot files to play with. I added some pngs attached of its output

Also available in: Atom PDF