Actions
Bug #4839
closed[Node Manager] Should look at Arvados node's crunch_worker_state, not info['slurm_state']
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Node Manager
Target version:
Story points:
0.5
Description
As of this writing, the Node Manager's ComputeNodeMonitorActor and friends look at the node record's info['slurm_state'] string to decide whether or not the node is eligible for shutdown.
But the API server is responsible for knowing its dispatch method and translating between that and a common string. It exposes this as crunch_worker_state, which can be one of 'busy', 'idle', or 'down'. Node Manager should use this field to make shutdown decisions instead.
Note that I'm only talking about making a change when Node Manager is looking at a node record to make shutdown decisions. Code that talks to SLURM directly, like the ComputeNodeShutdownActor in the SLURM dispatch module, doesn't need to be changed.
Actions