Project

General

Profile

Actions

Feature #20383

open

Monitoring that gives list of compute containers that don't seem to be making progress

Added by Peter Amstutz 11 months ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
-

Description

Want to get a real time list of containers who's CPU usage and I/O usage are very low indicating it isn't doing any work.

Actions #1

Updated by Peter Amstutz 11 months ago

  • Subject changed from Monitoring that gives list of "idle" compute nodes to Monitoring that gives list of compute containers that don't seem to be making progress
Actions #2

Updated by Peter Amstutz 11 months ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith 11 months ago

What about CUDA jobs? If they're pegging the GPU but nothing else, is that reported? Can they be excluded from this list?

Actions

Also available in: Atom PDF