[API] Nodes have a method to request and record shutdowns
The current Node Manager decides to shut down cloud nodes based on a node record's SLURM state. It's possible that a Node could be shut down shortly after it is allocated work. This isn't a huge loss of compute time, but it does cause a Job failure that can look mysterious at first.
It would be better if the API server provided an atomic way to request and record Node shutdowns. This has a few components:
- Add a method to NodesController that marks a node as "being shut down" if and only if it is not currently running a Job.
- Modify the Node model so that attempts to assign a job to it (setting job_uuid) fails if it's marked as "being shut down."
- Modify crunch-dispatch so that it updates node assignments on the API server, and checks for OK responses, before it begins dispatching work.
- Modify the Node Manager to request shutdowns with the API server, and only proceed after an OK response.