Idea #12383
Updated by Peter Amstutz over 6 years ago
Suggested list of node states
* Requested - create request for node size X will be sent
* Assigned - create request returned a cloud node id, waiting to pair
* Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work)
** don't record every idle<-->busy transition (slurm is the source of truth here)
** from here the transition table in nodemanager decides when to go into drain/shutdown state. this is based on:
*** node status: busy/down/idle/unpaired
*** shutdown windown open/closed (AWS billing optimization, could be removed)
*** boot wait or boot exceeded
*** idle wait or idle exceeded (how long to wait for more work, currently not implemented)
* Drain - will set "drain" state in SLURM, wait for work to complete, then transition to shutdown
* Shutdown - shutdown request will be sent
* Gone - corresponding cloud node is no longer present in the cloud nodes table.
Changes:
Explicit "state", "node_size" "cloud_node" columns.
Control nodes by setting state.
Node manager determines next action based on state in nodes table: to create a new node, create a node record in "Requested" state. To shutdown a node, set its state to "Drain" or "Shutdown".
Instead of having node manager determine the wishlist, crunch-dispatch-slurm could create node requests when it wants node manager to start a new node.