Feature #12444

Compute nodes monitor the tmpdir space over time

Added by Bryan Cosca over 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
10/11/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release:
Release relationship:
Auto

Subtasks

Task #13842: Review 12444-tmpdir-monitoringResolvedPeter Amstutz


Related issues

Related to Arvados - Feature #13913: Crunchstat-summary graphs tmpdir usageResolved07/09/2019

Associated revisions

Revision f5d7521c
Added by Lucas Di Pentima over 2 years ago

Merge branch '12444-tmpdir-monitoring'
Closes #12444

Arvados-DCO-1.1-Signed-off-by: Lucas Di Pentima <>

History

#1 Updated by Bryan Cosca over 3 years ago

  • Tracker changed from Bug to Feature

shoot, I think this should be in a different backlog but I don't have the sufficient permissions to move it.

#2 Updated by Tom Morris about 3 years ago

  • Project changed from OPS to Arvados
  • Target version set to To Be Groomed

#3 Updated by Lucas Di Pentima about 3 years ago

  • Enhance crunchstat so that available free space is periodically logged along with the already present mem, cpu & i/o stats.
  • To avoid having to be calling the df command, golang provides a syscall package with an appropriate Statfs function.
  • node-info logs already record the available space and i-nodes at the start, so this addition would complement that information.

#4 Updated by Tom Morris about 3 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 1.0

#5 Updated by Tom Clegg about 3 years ago

Should log all three figures (available, used, total). Generally available+used<total because the "available only to root" portion is not counted as available. Reporting tools can decide the most useful way to report.

#6 Updated by Tom Morris over 2 years ago

  • Target version changed from Arvados Future Sprints to 2018-08-01 Sprint

#7 Updated by Tom Morris over 2 years ago

  • Assigned To set to Lucas Di Pentima

#8 Updated by Lucas Di Pentima over 2 years ago

  • Status changed from New to In Progress

#9 Updated by Lucas Di Pentima over 2 years ago

Updates at e4c31590b - branch 12444-tmpdir-monitoring
Test run: https://ci.curoverse.com/job/developer-run-tests/821/

Adds tmpdir stats to crunchstat reporting: available, used & total with usage increments.

#10 Updated by Lucas Di Pentima over 2 years ago

There's a sdk/python test failure, looking into it.

#11 Updated by Lucas Di Pentima over 2 years ago

As suggested by Peter, rebasing on latest master fixed the issue.

Now at b211e857d304f7fbe8787d2b65a307da841d047b
Test run: https://ci.curoverse.com/job/developer-run-tests-sdk-python-ruby/108/

#12 Updated by Peter Amstutz over 2 years ago

    err := syscall.Statfs("/tmp", &s)

Shouldn't be hardcoded. By default it should use $TMPDIR, but it would be better to pass it in. Crunch-run should be updated to pass in runner.parentTemp.

        total:      s.Blocks * bs,
        used:       (s.Blocks - s.Bavail) * bs,
        available:  s.Bavail * bs,

Should be used: (s.Blocks - s.Bfree) * bs

(because Bavail < Bfree)

    r.Logger.Printf("tmpdir available:%d used:%d total:%d%s\n",
        nextSample.available, nextSample.used, nextSample.total, delta)

I don't think this formatting is consistent with the other crunchstat lines (which have the number first, then the name), should be something like:

    r.Logger.Printf("statfs %d available %d used %d total%s\n", 
        nextSample.available, nextSample.used, nextSample.total, delta)

#13 Updated by Lucas Di Pentima over 2 years ago

Updates at 85f6919fa
Test run: https://ci.curoverse.com/job/developer-run-tests/824/

Addressed above comments and added a notice on the log to report which directory is being monitored.

#14 Updated by Peter Amstutz over 2 years ago

Lucas Di Pentima wrote:

Updates at 85f6919fa
Test run: https://ci.curoverse.com/job/developer-run-tests/824/

Addressed above comments and added a notice on the log to report which directory is being monitored.

This LGTM.

Could you file a follow-on ticket to update crunchstat-summary to graph disk usage?

#15 Updated by Lucas Di Pentima over 2 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#16 Updated by Lucas Di Pentima over 2 years ago

  • Related to Feature #13913: Crunchstat-summary graphs tmpdir usage added

#17 Updated by Tom Morris about 2 years ago

  • Release set to 13

Also available in: Atom PDF