Project

General

Profile

Idea #12383

Updated by Peter Amstutz over 6 years ago

Suggested list of node states 

 * Requested - create request for node size X will be sent  
 * Assigned - create request returned a cloud node id, waiting to pair 
 * Paired - cloud node has pinged the API server initiating it has completed initialization, and is busy or idle (ready to accept work) 
 ** don't record every idle<-->busy transition (slurm is the source of truth here) 
 ** from here the transition table in nodemanager decides when to go into drain/shutdown state.    this is based on: 
 *** node status: busy/down/idle/unpaired 
 *** shutdown windown open/closed (AWS billing optimization, could be removed) 
 *** boot wait or boot exceeded 
 *** idle wait or idle exceeded (how long to wait for more work, currently not implemented) 
 * Drain - will set "drain" state in SLURM, wait for work to complete, then transition to shutdown 
 * Shutdown - shutdown request will be sent 
 * Gone - corresponding cloud node is no longer present in the cloud nodes table. 

 Changes: 

 Explicit "state", "node_size" "cloud_node" columns. 

 Control nodes by setting state. 

 Node manager determines next action based on state in nodes table: to create a new node, create a node record in "Requested" state.    To shutdown a node, set its state to "Drain" or "Shutdown". 

 Instead of having node manager determine the wishlist, crunch-dispatch-slurm could create node requests when it wants node manager to start a new node. 

Back