Feature #12444
closedCompute nodes monitor the tmpdir space over time
Added by Bryan Cosca about 7 years ago. Updated about 6 years ago.
Updated by Bryan Cosca about 7 years ago
- Tracker changed from Bug to Feature
shoot, I think this should be in a different backlog but I don't have the sufficient permissions to move it.
Updated by Tom Morris about 7 years ago
- Project changed from 40 to Arvados
- Target version set to To Be Groomed
Updated by Lucas Di Pentima almost 7 years ago
- Enhance
crunchstat
so that available free space is periodically logged along with the already present mem, cpu & i/o stats. - To avoid having to be calling the
df
command, golang provides asyscall
package with an appropriateStatfs
function. node-info
logs already record the available space and i-nodes at the start, so this addition would complement that information.
Updated by Tom Morris almost 7 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 1.0
Updated by Tom Clegg almost 7 years ago
Should log all three figures (available, used, total). Generally available+used<total because the "available only to root" portion is not counted as available. Reporting tools can decide the most useful way to report.
Updated by Tom Morris over 6 years ago
- Target version changed from Arvados Future Sprints to 2018-08-01 Sprint
Updated by Lucas Di Pentima over 6 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima over 6 years ago
Updates at e4c31590b - branch 12444-tmpdir-monitoring
Test run: https://ci.curoverse.com/job/developer-run-tests/821/
Adds tmpdir stats to crunchstat reporting: available, used & total with usage increments.
Updated by Lucas Di Pentima over 6 years ago
There's a sdk/python
test failure, looking into it.
Updated by Lucas Di Pentima over 6 years ago
As suggested by Peter, rebasing on latest master fixed the issue.
Now at b211e857d304f7fbe8787d2b65a307da841d047b
Test run: https://ci.curoverse.com/job/developer-run-tests-sdk-python-ruby/108/
Updated by Peter Amstutz over 6 years ago
err := syscall.Statfs("/tmp", &s)
Shouldn't be hardcoded. By default it should use $TMPDIR, but it would be better to pass it in. Crunch-run should be updated to pass in runner.parentTemp
.
total: s.Blocks * bs, used: (s.Blocks - s.Bavail) * bs, available: s.Bavail * bs,
Should be used: (s.Blocks - s.Bfree) * bs
(because Bavail < Bfree)
r.Logger.Printf("tmpdir available:%d used:%d total:%d%s\n", nextSample.available, nextSample.used, nextSample.total, delta)
I don't think this formatting is consistent with the other crunchstat lines (which have the number first, then the name), should be something like:
r.Logger.Printf("statfs %d available %d used %d total%s\n", nextSample.available, nextSample.used, nextSample.total, delta)
Updated by Lucas Di Pentima over 6 years ago
Updates at 85f6919fa
Test run: https://ci.curoverse.com/job/developer-run-tests/824/
Addressed above comments and added a notice on the log to report which directory is being monitored.
Updated by Peter Amstutz over 6 years ago
Lucas Di Pentima wrote:
Updates at 85f6919fa
Test run: https://ci.curoverse.com/job/developer-run-tests/824/Addressed above comments and added a notice on the log to report which directory is being monitored.
This LGTM.
Could you file a follow-on ticket to update crunchstat-summary to graph disk usage?
Updated by Lucas Di Pentima over 6 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|f5d7521ca506d63f631c603938cac5f40663bcca.
Updated by Lucas Di Pentima over 6 years ago
- Related to Feature #13913: Crunchstat-summary graphs tmpdir usage added