Idea #12383
Updated by Peter Amstutz over 6 years ago
Proposed Suggested list of node record states * Requested - create request for node size X will be sent * Assigned - create request returned a cloud node id, waiting to pair * Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work) ** don't record every idle<-->busy transition (slurm is the source of truth here) ** from here the transition table in nodemanager decides when to go into drain/shutdown state. this is based on: *** node status: busy/down/idle/unpaired *** shutdown windown open/closed (AWS billing optimization, could be removed) *** boot wait or boot exceeded *** idle wait or idle exceeded (how long to wait for more work, currently not implemented) * Draining Drain - will set "drain" state in SLURM, wait for work to complete complete, then transition to shutdown * Shutdown - shutdown request will be sent * Gone - corresponding cloud node is no longer present in the cloud nodes table, record can be safely deleted. Changes: API server gets explicit Explicit "state", "node_size" "cloud_node" columns. Control nodes by setting state. Node manager determines next action based on state in nodes table, and is responsive table: to external changes to state. To create a new node, create a node record in "Requested" state. To shutdown a node, set its state to "Drain" or "Shutdown". Wishlist items are fufilled by creating Consider instead of having node manager determine the wishlist, crunch-dispatch-slurm could create a new node record in "Request" state. state when it wants node manager to start a new node.