Project

General

Profile

Actions

Bug #5523

open

[Crunch] crunchstat should not report errors during normal timing races

Added by Peter Amstutz about 9 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
0.5
Release:
Release relationship:
Auto

Description

Container stat files appear and disappear in normal operation. In the "normal" cases, such events should not be logged (let alone as an error).

We expect zero or one episode of "cannot find stats file" when cidfile != "" and we're collecting stats for the first time.
  • If the first collection attempt for a given statistic results in "cannot find file", we should block in OpenStatFile and poll quickly over a short interval (say, every 100ms, max 1s) because we probably just won the race with the container setup process.
  • If the stat files don't show up within that max interval (~1s) it means something is wrong, and this should (still) be logged.
We expect zero or one episode of "stats file disappeared" when cidfile != "" when we happen to poll between container shutdown and (crunchstat's) child exit. For a given statistic:
  • The first time this occurs, we should not log anything.
  • The second time this occurs, we should log "warning: stats file disappeared {duration} ago, but child has not exited".
  • The third+ time this occurs, we should not log anything.
  • If the stat file reappears, we should reset the "went missing" counter to zero.

Subtasks 2 (1 open1 closed)

Task #5918: Review 5523-stats-errorResolvedTom Clegg05/07/2015Actions
Task #5938: Handle normal container startup and shutdown races without logging an error/notice or missing the first intervalNewTom Clegg05/07/2015Actions

Related issues

Related to Arvados - Bug #4882: [Crunch] crunchstat reports surprising CPU usage when container appears and disappearsResolvedTom Clegg12/29/2014Actions
Actions

Also available in: Atom PDF