Feature #20383
open
Monitoring that gives list of compute containers that don't seem to be making progress
Added by Peter Amstutz almost 2 years ago.
Updated almost 2 years ago.
Description
Want to get a real time list of containers who's CPU usage and I/O usage are very low indicating it isn't doing any work.
- Subject changed from Monitoring that gives list of "idle" compute nodes to Monitoring that gives list of compute containers that don't seem to be making progress
- Description updated (diff)
What about CUDA jobs? If they're pegging the GPU but nothing else, is that reported? Can they be excluded from this list?
Also available in: Atom
PDF