Actions
Task #4322
closedIdea #4293: [Node Manager] Write off cloud nodes that spend too long in booted state
[Node Manager] Should not pair cloud and Arvados nodes immediately after booting
Description
The development of Node Manager went like this:
- First, it immediately paired cloud and Arvados nodes when booting them, to avoid duplicate assignment.
- This didn't prevent duplicate assignments in cases where the cloud node struggled to come up, so we added the assigned_at logic to cover those.
- We added the separate "booted" state to nodes to better handle eventual consistency in the clouds.
Taken together, the logic to immediately pair nodes on boot is not only unnecessary, but causes trouble. A ComputeNodeMonitorActor for a node that doesn't bootstrap correctly may make incorrect shutdown suggestions based on the data in the Arvados node. Taking out the immediate pairing logic will make it easier to identify when a node has failed to bootstrap, and shut it back down appropriately.
Updated by Ward Vandewege about 10 years ago
- Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Updated by Brett Smith about 10 years ago
- Tracker changed from Bug to Task
- Parent task set to #4293
Updated by Brett Smith about 10 years ago
- Status changed from New to Resolved
- Remaining (hours) set to 0.0
Actions