Actions
Idea #4127
closed[API] Nodes have a method to request and record shutdowns
Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
07/17/2014
Due date:
07/17/2014
Story points:
3.0
Description
The current Node Manager decides to shut down cloud nodes based on a node record's SLURM state. It's possible that a Node could be shut down shortly after it is allocated work. This isn't a huge loss of compute time, but it does cause a Job failure that can look mysterious at first.
It would be better if the API server provided an atomic way to request and record Node shutdowns. This has a few components:
- Add a method to NodesController that marks a node as "being shut down" if and only if it is not currently running a Job.
- Modify the Node model so that attempts to assign a job to it (setting job_uuid) fails if it's marked as "being shut down."
- Modify crunch-dispatch so that it updates node assignments on the API server, and checks for OK responses, before it begins dispatching work.
- Modify the Node Manager to request shutdowns with the API server, and only proceed after an OK response.
Updated by Brett Smith almost 10 years ago
It's not clear that we need this now that #4380 is done.
Actions