Project

General

Profile

Actions

Bug #5736

closed

[Node Manager] Reuse node records after shutting down the cloud node set up with them

Added by Brett Smith about 9 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
-
Story points:
1.0

Description

When Node Manager uses an Arvados node record to set up a new compute node, it records the time that setup starts. It won't reuse this record for another setup for node_stale_time seconds, even if the setup is aborted because the cloud node is no longer needed, or fails to pair with its Arvados node. It would help if Node Manager could reuse those records faster; we just ran into an issue where there was slot_number exhaustion on a cluster because it wasn't reusing records for this reason.

This is non-trivial, because Node Manager retains no memory of what Arvados node record was used to set up a compute node. ComputeNodeMonitorActor can be initialized with an arvados_node, but the daemon is relying on that being initialized as None, and set during the pairing process, to detect unpaired nodes. Implementing this will require more involved state tracking.

Conflicts with #4129.


Subtasks 1 (0 open1 closed)

Task #5995: Review 5736-node-manager-easy-slot-cleanup-wipResolvedWard Vandewege05/11/2015Actions

Related issues

Related to Arvados - Bug #4129: [Node Manager] Don't reuse node table recordsClosedActions
Actions

Also available in: Atom PDF