Project

General

Profile

Actions

Bug #5292

closed

[Node Manager] Failed to recognize busy node on qr1hi

Added by Brett Smith about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
-
Story points:
-

Description

This morning on qr1hi, the single minimum compute node was up (qr1hi-7ekkf-09hjulgcrpxp1iw), and running a job (qr1hi-8i9sb-mcla1dzm2zrpl0t).

At 10:15 EST, qr1hi-8i9sb-palag4xt4jjln0v was added to the queue. This was reflected in Node Manager's internal server wishlist, but Node Manager did not start a node to accommodate it.

Several jobs and nodes came up in the following time. I'm not saying this is the cause, because I haven't tracked it down yet, but in general Node Manager acted like one of the up nodes was idle when it was in fact busy. It created even more nodes as more jobs were added to the queue, but it was always behind by one.

The original compute node record's crunch_worker_state appears correct now, so there's not anything blatantly wrong there.


Related issues

Is duplicate of Arvados - Bug #4751: [Node Manager] Can erroneously pair cloud nodes with stale Arvados node recordsResolvedBrett Smith03/02/2015Actions
Actions

Also available in: Atom PDF