Project

General

Profile

Actions

Feature #8016

closed

[Crunch2] Report statistics

Added by Peter Amstutz almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0
Release:
Release relationship:
Auto

Subtasks 3 (0 open3 closed)

Task #9481: Refactor crunchstat as a moduleResolvedTom Clegg06/30/2016Actions
Task #9482: Use crunchstat in crunch-run to report container resource usageResolvedTom Clegg06/30/2016Actions
Task #9484: Review 8016-crunchrun-crunchstatResolvedRadhika Chippada07/01/2016Actions
Actions #1

Updated by Peter Amstutz over 8 years ago

  • Assigned To set to Peter Amstutz
Actions #2

Updated by Tom Clegg over 8 years ago

Refactor crunchstat as a module, so crunch-run can start up a goroutine that monitors the correct cgroup (based on what docker client library tells us) and writes the appropriate log events/collections.

Actions #3

Updated by Tom Clegg over 8 years ago

  • Target version set to 2016-07-06 sprint
Actions #4

Updated by Tom Clegg over 8 years ago

  • Assigned To changed from Peter Amstutz to Tom Clegg
Actions #5

Updated by Tom Clegg over 8 years ago

  • Story points set to 1.0
Actions #6

Updated by Tom Clegg over 8 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Tom Clegg over 8 years ago

8016-crunchrun-crunchstat @ c9e6477

Actions #8

Updated by Radhika Chippada over 8 years ago

lib/crunchstat/crunchstat.go

  • line 314: missing err check
  • In doCPUStats, need any consideration for the case when getCPUCount return 0 (due to errors caught)?
  • How / where is lastCPUSample.hasData being set?
  • "If CID is empty, wait for it to appear in CIDFile. Return true if we get it before r.done indicates someone called Stop" => "If CID is empty, wait for it to appear in CIDFile. Return true if we get it before r.done which indicates someone called Stop"? Similar comment at line 409.
  • Should we be returning true if CIDFile is empty and CID is not available here?
    func (r *Reporter) waitForCIDFile() bool {
        if r.CID != "" || r.CIDFile == "" {
            return true
        }
    
  • In Reporter struct, can we call Poll as PollInterval?

services/crunch-run/crunchrun.go

  • "resource usage statistics reporting period" => can you please expand this a little? I am understadning this as resource usage stats will be reported at regular intervals given?
Actions #9

Updated by Tom Clegg over 8 years ago

Radhika Chippada wrote:

lib/crunchstat/crunchstat.go

  • line 314: missing err check

Fixed. (The "return zero" outcome is the same but better to do it explicitly.)

  • In doCPUStats, need any consideration for the case when getCPUCount return 0 (due to errors caught)?

We'll report "0 cpus" and the downstream analysis will have to cope with it.

I think this is preferable to dropping the rest of the information for this sample period. It probably means either that method never works on this system, or something terrible has happened like we're running out of file descriptors (and the times when the worker host is having trouble are probably the times when we are most eager to save all the information we can).

  • How / where is lastCPUSample.hasData being set?

This was set by nextSample := cpuSample{true, ....}. I've fixed that to use named fields, so it's less mysterious.

-       nextSample := cpuSample{true, time.Now(), 0, 0, r.getCPUCount()}
        var userTicks, sysTicks int64
        fmt.Sscanf(string(b), "user %d\nsystem %d", &userTicks, &sysTicks)
        userHz := float64(C.sysconf(C._SC_CLK_TCK))
-       nextSample.user = float64(userTicks) / userHz
-       nextSample.sys = float64(sysTicks) / userHz
+       nextSample := cpuSample{
+               hasData:    true,
+               sampleTime: time.Now(),
+               user:       float64(userTicks) / userHz,
+               sys:        float64(sysTicks) / userHz,
+               cpus:       r.getCPUCount(),
+       }
  • "If CID is empty, wait for it to appear in CIDFile. Return true if we get it before r.done indicates someone called Stop" => "If CID is empty, wait for it to appear in CIDFile. Return true if we get it before r.done which indicates someone called Stop"? Similar comment at line 409.

This was supposed to read as "before r.done indicates (by being closed) that someone called Stop" but I see how it doesn't scan well. Updated to "...before we learn (via r.done) that someone called Stop".

  • Should we be returning true if CIDFile is empty and CID is not available here?
    [...]

Yes. If CIDFile is empty, there's no CIDFile to wait for (returning true just means nobody called Stop before we returned).

If both are empty, we're going to collect host stats instead of container stats.

  • In Reporter struct, can we call Poll as PollInterval?

Sure. I went with PollPeriod since it's more suggestive (in case anyone cares) that we're doing "poll once every X seconds", rather than "poll, then wait X seconds, then poll again".

services/crunch-run/crunchrun.go

  • "resource usage statistics reporting period" => can you please expand this a little? I am understadning this as resource usage stats will be reported at regular intervals given?

Changed to "sampling period for periodic resource usage reporting" ... better?

Now at cf1f493

Actions #10

Updated by Radhika Chippada over 8 years ago

LGTM

Actions #11

Updated by Tom Clegg over 8 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:257d60253246952b435cea23b1912af80ea2c6d6.

Actions

Also available in: Atom PDF