Project

General

Profile

Actions

Bug #4882

closed

[Crunch] crunchstat reports surprising CPU usage when container appears and disappears

Added by Bryan Cosca about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-

Description

Background

source:services/crunchstat/crunchstat.go has a list of locations where cgroup accounting files are likely to be found. Historically this was needed because the location varies from one kernel/system to another. Each time crunchstat reads stats, it tries each location in turn until it finds one that exists. When the target container is up, this works fine, but the last two locations give accounting information for the host instead of the container. Often, crunchstat tries to read accounting data before the container has come up or after the container has been destroyed (in the interval before it gets notified that the docker process has exited). Naturally, the transition between host stats and container stats results in reporting wild deltas.

The transition from one stat file to another causes an "error" log message which is easy to misinterpret as an error affecting the job, or explaining some later job failure -- see #5523.

Proposed fix

One or more of:
  • Don't report host-level stats if container-level stats have ever been reported.
  • Don't report host-level stats ever.
  • When the previous sample point and the current sample point come from different accounting files, don't print a delta.

Bug report

Originally reported as "Log graph goes out of bounds":

about 5/6th of the way through, you can see the yellow point go off the screen. Suggestion: maybe the graph should scale in real time... Hmm, I'm not sure because then the graph would keep scaling and would probably get annoying. It would still be useful to at least see that point though.


Files

log_graph_oob.png (56.3 KB) log_graph_oob.png Bryan Cosca, 12/29/2014 08:48 PM

Subtasks 1 (0 open1 closed)

Task #5932: Review 4882-no-host-when-containerResolvedTom Clegg12/29/2014Actions

Related issues

Related to Arvados - Bug #5523: [Crunch] crunchstat should not report errors during normal timing racesNewTom CleggActions
Actions

Also available in: Atom PDF