https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422015-12-02T20:01:54ZArvadosArvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=330572015-12-02T20:01:54ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/33057/diff?detail_id=32467">diff</a>)</li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=330622015-12-02T20:08:15ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/33062/diff?detail_id=32473">diff</a>)</li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=330642015-12-02T20:13:05ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/33064/diff?detail_id=32476">diff</a>)</li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=330662015-12-02T20:15:26ZTom Cleggtom@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/33066/diff?detail_id=32478">diff</a>)</li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=330672015-12-02T20:17:35ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Story points</strong> set to <i>1.0</i></li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=330722015-12-02T20:27:21ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Tom Clegg</i></li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=333762015-12-10T15:44:43ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=333892015-12-10T20:57:14ZTom Cleggtom@curii.com
<ul></ul><p>7901-crunchstat-summary @ <a class="changeset" title="7901: Add crunchstat-summary program." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/1d4f047fab325e9c6844b5550747e9a815e6654c">1d4f047</a> and <a class="changeset" title="7901: Add crunchstat-summary" href="https://dev.arvados.org/projects/arvados/repository/arvados-dev/revisions/575f75167b48977b3c825e30f944fca70a74f901">arvados-dev|575f751</a></p>
<pre>
$ crunchstat-summary --job <a href="https://arvadosapi.com/4xphq-8i9sb-jq0ekny1xou3zoh">4xphq-8i9sb-jq0ekny1xou3zoh</a> | expand -t12
category metric max max_rate
net:eth0 rx 1754364530 41658344.87
net:eth0 tx 38837956 920817.97
mem rss 349814784 -
mem cache 1678139392 -
mem swap 0 -
mem pgmajfault 0 -
keepcalls put 0 0.00
keepcalls get 0 0.00
keepcache miss 0 0.00
keepcache hit 0 0.00
blkio:0:0 write 0 0.00
blkio:0:0 read 0 0.00
net:keep0 rx 0 0.00
net:keep0 tx 0 0.00
cpu sys 1.92 0.04
cpu user 3.83 0.09
cpu cpus 8 -
fuseops write 0 0.00
fuseops read 0 0.00
</pre>
<p>Or, equivalently,</p>
<pre>
$ zcat ~/arvados/tools/crunchstat-summary/tests/logfile_20151204190335.txt.gz | crunchstat-summary
</pre>
<p>or</p>
<pre>
$ crunchstat-summary --log-file ~/arvados/tools/crunchstat-summary/tests/logfile_20151204190335.txt.gz
</pre> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=333962015-12-10T21:19:21ZTom Cleggtom@curii.com
<ul></ul><p>another example</p>
<pre>
$ crunchstat-summary --job <a href="https://arvadosapi.com/qr1hi-8i9sb-qwr5epuobcre09m">qr1hi-8i9sb-qwr5epuobcre09m</a> | expand -t12
category metric max max_rate
mem cache 15702978560 -
mem pgmajfault 315 -
mem rss 10732777472 -
cpu sys 152.38 0.11
cpu user 29305.28 2.55
cpu cpus 8 -
net:eth0 rx 44485200 649806.51
net:eth0 tx 6898753050 74613976.17
</pre> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=334432015-12-12T02:32:00ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Reviewing <a class="changeset" title="7901: Add crunchstat-summary program." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/1d4f047fab325e9c6844b5550747e9a815e6654c">1d4f047</a>.</p>
<p>I assume the answer is no, but just being extra-safe: I don't recognize the <code>collection.py</code> Crunch script from anywhere. Should we be concerned about the PDHs that appear in those logs, privacy-wise? If so, we might need to aggressively garbage collect the commit from the Git server. I double-checked that all the API tokens have been expired, and they have, so that's reassuring.</p>
<p>In Summarizer._logdata, you slipped into Ruby mind and raise bare strings a few times. That's a bug:</p>
<pre>>>> raise "foo"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: exceptions must be old-style classes or derived from BaseException, not str
</pre>
<p>I suggest ValueError as the most Pythonic exception for both these cases, but I'm not wedded to it.</p>
<p>Nothing else is critical or would block a merge, but a few readability suggestions:</p>
<p>There a few single-letter variables lurking around: <code>j</code> and <code>c</code> in Summarizer._logdata, <code>m</code> in Summarizer.run, and <code>s</code> in the test method. They're all innocuous now, but code grows…</p>
<p>Suggest <code>str.endswith('.ext')</code> over <code>str[-4:] == '.ext'</code>. This comes up in Summarizer.run, Summarizer._logdata, and the test method. Not writing the length yourself is DRYer.</p>
<p>Anything you can do with printf-style strings, you can also do with format strings, if you like:</p>
<ul>
<li><code>{!r}</code> to write an object's repr, as in Summarizer.run.</li>
<li><code>{:.2f}</code> to specify precision for a float.</li>
</ul>
<p>There are some unused imports that could be cleaned up: itertools and os in summarizer.py. fnmatch in the tests. (Did you want to import glob and make your loop <code>for fnm in glob.glob(os.path.join(dirname, '*.txt.gz'))</code>?)</p>
<p>Props for <code>for val, stat in zip(words[::2], words[1::2])</code>, that was a fun line of code to read.</p>
<p>Thanks.</p> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=335012015-12-14T21:50:52ZTom Cleggtom@curii.com
<ul></ul><p>Brett Smith wrote:</p>
<blockquote>
<p>I assume the answer is no, but just being extra-safe: I don't recognize the <code>collection.py</code> Crunch script from anywhere. Should we be concerned about the PDHs that appear in those logs, privacy-wise? If so, we might need to aggressively garbage collect the commit from the Git server. I double-checked that all the API tokens have been expired, and they have, so that's reassuring.</p>
</blockquote>
<p>This is from the Keep performance tests on 4xphq, from <a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="Idea: [FUSE] Write a FUSE performance pipeline (Resolved)" href="https://dev.arvados.org/issues/7780">#7780</a>. The input is LobSTR reference data. (Sorry, probably could have saved you some time by mentioning where those examples came from.)</p>
<blockquote>
<p>In Summarizer._logdata, you slipped into Ruby mind and raise bare strings a few times. That's a bug:</p>
</blockquote>
<p>Oops, fixed. ValueError it is.</p>
<blockquote>
<p>There a few single-letter variables lurking around: <code>j</code> and <code>c</code> in Summarizer._logdata, <code>m</code> in Summarizer.run, and <code>s</code> in the test method. They're all innocuous now, but code grows…</p>
</blockquote>
<p>fixed → job, collection, summarizer.</p>
<blockquote>
<p>Suggest <code>str.endswith('.ext')</code> over <code>str[-4:] == '.ext'</code>. This comes up in Summarizer.run, Summarizer._logdata, and the test method. Not writing the length yourself is DRYer.</p>
</blockquote>
<p>Ah yes, that's better.</p>
<blockquote>
<p>Anything you can do with printf-style strings, you can also do with format strings, if you like:</p>
<ul>
<li><code>{!r}</code> to write an object's repr, as in Summarizer.run.</li>
<li><code>{:.2f}</code> to specify precision for a float.</li>
</ul>
</blockquote>
<p>Thanks. Updated these so we're just using format() everywhere.</p>
<blockquote>
<p>There are some unused imports that could be cleaned up: itertools and os in summarizer.py. fnmatch in the tests. (Did you want to import glob and make your loop <code>for fnm in glob.glob(os.path.join(dirname, '*.txt.gz'))</code>?)</p>
</blockquote>
<p>Yes! I started out hoping to use fnmatch and didn't go back and clean this up. glob() is better.</p>
<blockquote>
<p>Props for <code>for val, stat in zip(words[::2], words[1::2])</code>, that was a fun line of code to read.</p>
</blockquote>
<p>I merely selected it from various approaches on stackexchange... but yes, I agree. :)</p>
<p>rebased, now at <a class="changeset" title="7901: Add crunchstat-summary program." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/40158e8f7027d51ca704c3fd6039818acf2b21c0">40158e8</a></p> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=335762015-12-16T15:35:45ZTom Cleggtom@curii.com
<ul></ul><p>7901-human-summary @ <a class="changeset" title="7901: Add job stats, elapsed time, summed user+sys and tx+rx, and some human-readable highlights." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/cf4c50aa0ef8522f2c1f0ccaf5ce427051fb5af9">cf4c50a</a></p>
<pre>
category metric task_max task_max_rate job_total
blkio:0:0 read 0 0.00 0
blkio:0:0 write 0 0.00 0
cpu cpus 8 - -
cpu sys 1.92 0.04 1.92
cpu user 3.83 0.09 3.83
cpu user+sys 5.75 0.13 5.75
fuseops read 0 0.00 0
fuseops write 0 0.00 0
keepcache hit 0 0.00 0
keepcache miss 0 0.00 0
keepcalls get 0 0.00 0
keepcalls put 0 0.00 0
mem cache 1678139392 - -
mem pgmajfault 0 - 0
mem rss 349814784 - -
mem swap 0 - -
net:eth0 rx 1754364530 41658344.87 1754364530
net:eth0 tx 38837956 920817.97 38837956
net:eth0 tx+rx 1793202486 42579162.83 1793202486
net:keep0 rx 0 0.00 0
net:keep0 tx 0 0.00 0
net:keep0 tx+rx 0 0.00 0
time elapsed 80 - 80
# Max CPU time spent by a single task: 5.75s
# Max CPU usage in a single interval: 13.00%
# Overall CPU usage: 7.19%
# Max memory used by a single task: 0.35GB
# Max network traffic in a single task: 1.79GB
# Max network speed in a single interval: 42.58MB/s
</pre> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=336312015-12-16T21:47:49ZBrett Smithbrett.smith@curii.com
<ul></ul><p>Reviewing 7901-human-summary @ <a class="changeset" title="7901: Add job stats, elapsed time, summed user+sys and tx+rx, and some human-readable highlights." href="https://dev.arvados.org/projects/arvados/repository/arvados/revisions/cf4c50aa0ef8522f2c1f0ccaf5ce427051fb5af9">cf4c50a</a>.</p>
<p>This is good to merge. One thing that might improve readability is using collections.defaultdict for some of the deep dictionaries. This can save you from doing lots of <code>key in dict</code> and <code>dict.setdefault</code> guarding. I think the two variable declarations that would pay off most (with the right imports) are:</p>
<pre><code class="python syntaxhl"><span class="n">task_stats</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">defaultdict</span><span class="p">(</span><span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">collections</span><span class="p">.</span><span class="n">defaultdict</span><span class="p">,</span> <span class="nb">dict</span><span class="p">))</span>
<span class="n">job_tot</span> <span class="o">=</span> <span class="n">collections</span><span class="p">.</span><span class="n">defaultdict</span><span class="p">(</span><span class="n">functools</span><span class="p">.</span><span class="n">partial</span><span class="p">(</span><span class="n">collections</span><span class="p">.</span><span class="n">defaultdict</span><span class="p">,</span> <span class="nb">int</span><span class="p">))</span>
</code></pre>
<p>And then this is borderline trivial, but in the <code>for args in…</code> loop in _report_gen, using tuples for the args instead of lists would be a little more Pythonic. There's a lot of convention that tuples are good for fixed-length, heterogeneous sequences; where lists are good for variable-length, homogeneous sequences.</p>
<p>Thanks.</p> Arvados - Idea #7901: [Crunch] Script to report maximum resource utilization from a job loghttps://dev.arvados.org/issues/7901?journal_id=336402015-12-17T15:10:10ZTom Cleggtom@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset arvados|commit:0d66f5f5c5173f0faad6318a9ac87d11964e5748.</p>