Project

General

Profile

Feature #22157

Updated by Peter Amstutz 3 months ago

It would be very helpful to have the high water marks for CPU and RAM usage of a container recorded on the container record.    There's two main uses I can think of: 

 * Detecting likely OOM conditions.    Right now, arvados-cwl-runner parses crunch-run.log to get this information and put it as a runtime status warning.    This means if arvados-cwl-runner itself goes OOM, it doesn't get flagged to the user. 
 * Detecting under-utilization.    If CPU or RAM usage is < 50% for the whole run, it's a candidate for reducing the resource request.    However, before changing the workflow, it is important to know if the step is consistently under-utilizing its resources, or if it's highly variable.    Having this in the database makes the information way more accessible than having to parse logs. 

Back