Project

General

Profile

Idea #7901

Updated by Tom Clegg over 8 years ago

Use case: A bioinformatician is cost-optimizing a functional pipeline.    They want to know whether the individual jobs in the pipeline are making efficient use of the compute node(s) available to it. 

 Provide a script to help answer that question.    It should take a job UUID and/or job log filename as an argument, parse the crunchstat lines in the corresponding log, and report the maximum point-in-time reading for each of the following resources: 

 * CPU used 
 ** Report a single top-style figure like "1600%" (meaning full utilization of 16 cores) 
 ** Don't bother with separate user/sys reporting; figuring out a good way to report those figures at maximum utilization will be a separate story if needed 
 * RAM used 
 * Swap used 
 * net throughput for each net device mentioned 
 * IO throughput for each blk device mentioned 

 Essentially, print the maximum value of every statistic ever printed by crunchstat. The max value before the "-- interval" delimiter will indicate maximum total usage by any one task. The max value of stats that come after the "-- interval" delimiter will indicate maximum throughput / CPU load. 

 When selecting the max value of an interval/throughput value, note the actual interval too. For example: 
 * <pre> 
 crunchstat: cpu 33.0700 user 3.9600 sys 8 cpus -- interval 10.0003 seconds 33.0700 user 3.9600 sys 
 crunchstat: cpu 66.1500 user 7.9600 sys 8 cpus -- interval 11.0003 seconds 33.0800 user 4.0000 sys 
 --- 
 max cpu 66.1400 user 3.9600 sys 8 cpus 
 max cpu interval 11.003 seconds 37.0800 user+sys 
 </pre> 

 It's understood that the script can't report whether the underlying tool would run just as well with fewer resources.    It's only reporting whether the tool is using the resources it has. 

 Some example crunchstat output: 
 * <pre> 
 crunchstat: mem 57286656 cache 0 swap 27 pgmajfault 1510322176 rss 
 crunchstat: cpu 33.0700 user 3.9600 sys 8 cpus -- interval 10.0003 seconds 29.8600 user 3.4900 sys 
 crunchstat: blkio:202:16 0 write 3665920 read -- interval 10.0003 seconds 0 write 0 read 
 crunchstat: net:eth0 3301 tx 153030 rx -- interval 10.0003 seconds 0 tx 0 rx 
 </pre>

Back