Project

General

Profile

Actions

Feature #12444

closed

Compute nodes monitor the tmpdir space over time

Added by Bryan Cosca over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0
Release:
Release relationship:
Auto

Subtasks 1 (0 open1 closed)

Task #13842: Review 12444-tmpdir-monitoringResolvedPeter Amstutz10/11/2017Actions

Related issues

Related to Arvados - Feature #13913: Crunchstat-summary graphs tmpdir usageResolvedTom Morris07/09/2019Actions
Actions #1

Updated by Bryan Cosca over 6 years ago

  • Tracker changed from Bug to Feature

shoot, I think this should be in a different backlog but I don't have the sufficient permissions to move it.

Actions #2

Updated by Tom Morris over 6 years ago

  • Project changed from 40 to Arvados
  • Target version set to To Be Groomed
Actions #3

Updated by Lucas Di Pentima over 6 years ago

  • Enhance crunchstat so that available free space is periodically logged along with the already present mem, cpu & i/o stats.
  • To avoid having to be calling the df command, golang provides a syscall package with an appropriate Statfs function.
  • node-info logs already record the available space and i-nodes at the start, so this addition would complement that information.
Actions #4

Updated by Tom Morris over 6 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0
Actions #5

Updated by Tom Clegg over 6 years ago

Should log all three figures (available, used, total). Generally available+used<total because the "available only to root" portion is not counted as available. Reporting tools can decide the most useful way to report.

Actions #6

Updated by Tom Morris almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2018-08-01 Sprint
Actions #7

Updated by Tom Morris almost 6 years ago

  • Assigned To set to Lucas Di Pentima
Actions #8

Updated by Lucas Di Pentima almost 6 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Lucas Di Pentima almost 6 years ago

Updates at e4c31590b - branch 12444-tmpdir-monitoring
Test run: https://ci.curoverse.com/job/developer-run-tests/821/

Adds tmpdir stats to crunchstat reporting: available, used & total with usage increments.

Actions #10

Updated by Lucas Di Pentima almost 6 years ago

There's a sdk/python test failure, looking into it.

Actions #11

Updated by Lucas Di Pentima almost 6 years ago

As suggested by Peter, rebasing on latest master fixed the issue.

Now at b211e857d304f7fbe8787d2b65a307da841d047b
Test run: https://ci.curoverse.com/job/developer-run-tests-sdk-python-ruby/108/

Actions #12

Updated by Peter Amstutz almost 6 years ago

    err := syscall.Statfs("/tmp", &s)

Shouldn't be hardcoded. By default it should use $TMPDIR, but it would be better to pass it in. Crunch-run should be updated to pass in runner.parentTemp.

        total:      s.Blocks * bs,
        used:       (s.Blocks - s.Bavail) * bs,
        available:  s.Bavail * bs,

Should be used: (s.Blocks - s.Bfree) * bs

(because Bavail < Bfree)

    r.Logger.Printf("tmpdir available:%d used:%d total:%d%s\n",
        nextSample.available, nextSample.used, nextSample.total, delta)

I don't think this formatting is consistent with the other crunchstat lines (which have the number first, then the name), should be something like:

    r.Logger.Printf("statfs %d available %d used %d total%s\n", 
        nextSample.available, nextSample.used, nextSample.total, delta)
Actions #13

Updated by Lucas Di Pentima almost 6 years ago

Updates at 85f6919fa
Test run: https://ci.curoverse.com/job/developer-run-tests/824/

Addressed above comments and added a notice on the log to report which directory is being monitored.

Actions #14

Updated by Peter Amstutz almost 6 years ago

Lucas Di Pentima wrote:

Updates at 85f6919fa
Test run: https://ci.curoverse.com/job/developer-run-tests/824/

Addressed above comments and added a notice on the log to report which directory is being monitored.

This LGTM.

Could you file a follow-on ticket to update crunchstat-summary to graph disk usage?

Actions #15

Updated by Lucas Di Pentima almost 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #16

Updated by Lucas Di Pentima over 5 years ago

  • Related to Feature #13913: Crunchstat-summary graphs tmpdir usage added
Actions #17

Updated by Tom Morris over 5 years ago

  • Release set to 13
Actions

Also available in: Atom PDF