Bug #4380
closed
To start draining a node:
scontrol update NodeName=computeNN State=DRAIN
To check current node state:
sinfo --noheader -o %t -n computeNN
One idea: ComputeNodeShutdownActor starts with a start_shutdown() message, that class uses it to send the current message. SlurmComputeNodeShutdownActor overrides it to initiate shutdown, then check for the result. Testers can override it for better isolation.
Config variable to set the local dispatch method. If none specified, use the actors from base computenode module.
Does ComputeNodeShutdownActor need to be passed the ComputeNodeMonitorActor, so that SlurmComputeNodeShutdownActor can re-check shutdown eligibility after draining is done? This would be a pretty significant overhaul…
- Status changed from New to In Progress
Ward says that if the node's shutdown window closes while the node is draining, Node Manager should cancel the shutdown and undrain the node.
4380-node-manager-computenode-reorg-wip is up for review. See the commit message in 0d49d9d for rationale. I'm pushing this for review separately to try to avoid too much mutual blocking between this and #4138.
Reviewing 4380-node-manager-computenode-reorg-wip at 0d49d9d0a
Looks good. Only minor comment: launcher.py still imports ComputeNodeSetupActor, ComputeNodeShutdownActor, ComputeNodeUpdateActor and ShutdownTimer, but of these, it only appears to use ComputeNodeUpdateActor. Are all of these imports necessary for reasons I can't obviously see?
Other than that LGTM. Thanks.
Tim Pierce wrote:
Looks good. Only minor comment: launcher.py still imports ComputeNodeSetupActor, ComputeNodeShutdownActor, ComputeNodeUpdateActor and ShutdownTimer, but of these, it only appears to use ComputeNodeUpdateActor. Are all of these imports necessary for reasons I can't obviously see?
Nope. That bug even predates this branch. I cleaned these up, along with ShutdownTimer too, and merged. Thanks.
- Status changed from In Progress to Resolved
- % Done changed from 25 to 100
Applied in changeset arvados|commit:6c68141eb50255128cf38b5717b15b16f2a8cdff.
Also available in: Atom
PDF