Project

General

Profile

Actions

Task #4322

closed

Idea #4293: [Node Manager] Write off cloud nodes that spend too long in booted state

[Node Manager] Should not pair cloud and Arvados nodes immediately after booting

Added by Brett Smith about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Target version:

Description

The development of Node Manager went like this:

  • First, it immediately paired cloud and Arvados nodes when booting them, to avoid duplicate assignment.
  • This didn't prevent duplicate assignments in cases where the cloud node struggled to come up, so we added the assigned_at logic to cover those.
  • We added the separate "booted" state to nodes to better handle eventual consistency in the clouds.

Taken together, the logic to immediately pair nodes on boot is not only unnecessary, but causes trouble. A ComputeNodeMonitorActor for a node that doesn't bootstrap correctly may make incorrect shutdown suggestions based on the data in the Arvados node. Taking out the immediate pairing logic will make it easier to identify when a node has failed to bootstrap, and shut it back down appropriately.

Actions #1

Updated by Ward Vandewege about 10 years ago

  • Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Actions #2

Updated by Brett Smith about 10 years ago

  • Tracker changed from Bug to Task
  • Parent task set to #4293
Actions #3

Updated by Brett Smith about 10 years ago

  • Assigned To set to Brett Smith
Actions #4

Updated by Brett Smith about 10 years ago

  • Status changed from New to Resolved
  • Remaining (hours) set to 0.0
Actions

Also available in: Atom PDF