Project

General

Profile

Actions

Idea #7980

closed

[Node Manager] Broken nodes always get shut down, exempt from all node-counting considerations

Added by Brett Smith over 8 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
-
Story points:
-

Description

Right now we're in a situation where we want the workaround from #7286 to kick in, but the daemon is declining to shut the nodes down because there are jobs in the queue waiting to be run. Shutdowns caused by node malfunction—basically, any case in ComputeNodeMonitorActor.shutdown_eligible except for self.in_state('idle')—should be respected unconditionally.

This will probably involve having consider_shutdown send a "force" boolean to its subscribers, and updating the daemon to respect that.

Actions #1

Updated by Brett Smith over 8 years ago

  • Description updated (diff)
  • Category set to Node Manager
Actions #2

Updated by Brett Smith over 8 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Ward Vandewege over 3 years ago

  • Status changed from New to Closed
Actions #4

Updated by Ward Vandewege over 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF