Idea #12383
Updated by Peter Amstutz over 6 years ago
Suggested list of node states * Requested - create request for node size X will be sent * Assigned - create request returned a cloud node id, waiting to pair * Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work) ** don't record every idle<-->busy transition (slurm is the source of truth here) ** from here the transition table in nodemanager decides when to go into drain/shutdown state. this is based on: *** node status: busy/down/idle/unpaired *** shutdown windown open/closed (AWS billing optimization, could be removed) *** boot wait or boot exceeded *** idle wait or idle exceeded (how long to wait for more work, currently not implemented) * Drain - will set "drain" state in SLURM, wait for work to complete, then transition to shutdown * Shutdown - shutdown request will be sent * Gone - corresponding cloud node is no longer present in the cloud nodes table, record can be safely deleted. table. Changes: Explicit "state", "node_size" "cloud_node" columns. Control nodes by setting state. Node manager determines next action based on state in nodes table: to create a new node, create a node record in "Requested" state. To shutdown a node, set its state to "Drain" or "Shutdown". Consider instead Instead of having node manager determine the wishlist, crunch-dispatch-slurm could create a new node record in "Request" state requests when it wants node manager to start a new node.