Feature #4233
Updated by Tom Clegg about 10 years ago
Scope of this story:
* A graph appears on the log tab of a job that is running or queued.
** When a websocket message arrives with a "stderr" log entry, in addition to appending the text the existing log viewer window, split the log text into lines (there can be multiple lines per log entry), grep for "crunchstat:" lines, and update the graph accordingly.
** For now, ignore everything the cumulative stats (before the "--" delimiter) and just graph the "interval" stats.
** Each log line is expected to have a numeric task identifier, T (it's "0" for the first task).
** Each crunchstat sample is expected to have the format "crunchstat: L X1 L1 X2 L2 ... -- interval Dt seconds D1 L1 D2 L2 ..." where L* are labels and D* are deltas.
** Each crunchstat sample should be rendered on the graph as a data point @(T, sum(Dx)/Dt)@. The legend for the series should be "L". The tooltip for the data point should list the individual D1 L1 etc. ("123 tx / 345 rx"), as well as sum(Dx)/Dt, the task id T ("task 0"), and the label L ("net:eth0").
** The graph should have a scale of ~1 pixel per second on the X axis: this will show 10-20 minutes of activity. Older activity will scroll off the left edge. For now, there's no need to support scrolling to see older data points, unless of course this turns out to be trivial to implement.
** Ignore lines with no @" -- interval "@.
** Probably best to use Morris. Existing example in @app/assets/javascripts/keep_disks.js.coffee@
* Provide a test helper (in apps/workbench/lib?) that replays crunch job logs.
** Accept arguments: path to plain text file, speed multiplier, optional job UUID.
** Read log messages from the specified file.
** Parse the timestamps.
** Replace the job UUID in the text file with the job UUID specified by the caller, if any.
** Copy the modified log entry into the Logs table using @Log.create()@.
** Introduce delays between log messages such that the logs are replayed faster (by the given speed multiplier) than suggested by the timestamps in the text file. If the speed multiplier given is zero or negative, don't delay at all.
** Do not modify the timestamps in the log text, though. The log record itself will have a current timestamp which will disagree with the text, but that should be fine.
** Raise an exception if RAILS_ENV is not "test" or "development".
** Make a rake task so the helper can be invoked from the command line as @"rake replaylog log.txt 2.0"@.
** Note on the [[Hacking Workbench]] wiki page how to use the helper to inject logs while hacking the graphing code (or the text log viewer, for that matter).
* A test case uses the helper to inject logs and confirm that the graph gets updated as a result.
** Add a real log file (or an excerpt of one) to the source tree for use in tests. (Check with Abram / science team for a good public example that has multiple concurrent tasks and some periods of interesting activity.)
Future work will likely include:
* Graph stats from multiple jobs at once (e.g., on the "show pipeline" page, when more than one job in the pipeline is running)
* View historical logs (the data will come from plain text, not websockets)
* Restrict the series to a subset of tasks or statistics (for now, a job with many concurrent tasks will have too much data to digest, although "near zero" and "not near zero" should still be possible to distinguish, which is usually the first question).
* Group the statistics by node.
* Group the statistics by label (e.g., total CPU load across all running tasks).