https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422014-05-26T14:41:23ZArvadosArvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=107702014-05-26T14:41:23ZBrett Smithbrett.smith@curii.com
<ul></ul><p>I have been unable to reproduce this so far. Running on compute34.qr1hi, using the same API token as the original job, I can run this in a Python interpreter, which I believe is functionally equivalent to what the grep script does:</p>
<pre>
import arvados
cr = arvados.CollectionReader('fee29077095fed2e695100c299f11dc5+2727')
for s in cr.all_streams():
for f in s.all_files():
cr = arvados.CollectionReader(f.as_manifest())
for crfile in cr.all_files():
for line in crfile.readlines():
pass
</pre>
<p>This runs to completion without raising an exception.</p> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=107712014-05-26T15:10:49ZBrett Smithbrett.smith@curii.com
<ul></ul><p>One weird discrepancy I can't figure out is that the line that raises this exception doesn't match what's in the backtrace (305):</p>
<pre>
compute34.qr1hi:~$ grep -nFC3 "Block not found" /usr/local/lib/python2.7/dist-packages/arvados/keep.py
300-
301- slot.set(None)
302- self.cap_cache()
303: raise arvados.errors.NotFoundError("Block not found: %s" % expect_hash)
304-
305- def get_url(self, url, headers, expect_hash):
306- h = httplib2.Http()
</pre> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=107722014-05-26T16:30:15ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Subject</strong> changed from <i>grep job on qr1hi fails to find Keep blocks</i> to <i>Reduce Keep server memory use</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/10772/diff?detail_id=8976">diff</a>)</li></ul><p>Reviewing the logs indicates that the Keep server had several block read errors at this time because it was out of memory. When the Python SDK couldn't get requested blocks from any Keep server, it translated that into the Block not found exception.</p>
<p>Abram mentioned in IRC that he had garbage collection issues working with Go in the past, and found it helpful to make explicit collection requests when appropriate. Ward adds:</p>
<pre>
cure: imo keep should agressively free memory after each block has been served
cure: because the kernel will cache it anyway in the disk cache, making subsequent requests fast
cure: (maybe it already does, I haven't looked)
</pre> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=107782014-05-27T10:32:30ZTim Piercetwp@curoverse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assigned To</strong> set to <i>Tim Pierce</i></li><li><strong>Target version</strong> set to <i>2014-05-28 Pipeline Factory</i></li></ul> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=107792014-05-27T10:40:11ZTim Piercetwp@curoverse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset arvados|commit:2b2adb421b9b82b75fe8a635442dfe8e1fab775a.</p> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=108052014-05-27T13:02:22ZWard Vandewegeward@curii.com
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>In Progress</i></li></ul> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=108162014-05-27T13:46:29ZWard Vandewegeward@curii.com
<ul><li><strong>Target version</strong> changed from <i>2014-05-28 Pipeline Factory</i> to <i>2014-06-17 Curating and Crunch</i></li></ul> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=108192014-05-27T14:05:33ZWard Vandewegeward@curii.com
<ul><li><strong>Story points</strong> set to <i>2.0</i></li></ul> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=110372014-06-03T15:40:53ZTim Piercetwp@curoverse.com
<ul></ul><p>To bring this up to date, here is what we have found. Testing consists of storing ~3GB of data in Keep once, and then running repeated GET requests on random blocks, with one or more concurrent clients on the same machine.</p>
<ul>
<li>Base memory footprint for Keep when idle is ~5MB.</li>
<li>After issuing the necessary PUT requests to Keep, the server's memory footprint is 260MB (256MB higher than base).</li>
<li>With one client running serial GET requests, server memory footprint rises to 516MB (256MB higher than post-PUT).</li>
<li>Each additional concurrent client adds roughly 128MB to server footprint</li>
<li>After clients terminate, memory returns to baseline levels after 5-10 minutes.</li>
</ul>
<p>Curiously, Go's own profiling tools do not account for the RSS reported by the kernel.</p>
<p>Peter has observed that ioutil.ReadAll grows its buffer exponentially, so when reading 64MB of data over the network, it will allocate a 128MB buffer. That appears likely to be partly responsible for bloating memory usage.</p>
<p>Next steps:</p>
<ul>
<li>Rewrite to avoid <code>ioutil.ReadAll</code>:
<ul>
<li>Rewrite <code>main.PutBlock</code> to stream data from the network directly to a temporary file instead of an in-memory buffer, checksumming as it goes.</li>
<li>Possibly: rewrite <code>main.GetBlock</code> to return a block even if the checksum does not match, and simply log an error if it does. Doing so will allow us to handle GET with a relatively small buffer. It is reasonable to expect the client to perform its own checksum anyway, so it is not necessarily unwise for Keep to return the source data.</li>
</ul>
</li>
<li>(lower) Understand the discrepancy between "go tool pprof" and the RSS reported by top. My assumption is that this is an artifact of Go's allocator hanging on to this memory so that the next call to <code>make</code> will not require it to go back to <code>malloc(3)</code> but</li>
</ul> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=110602014-06-04T10:51:47ZWard Vandewegeward@curii.com
<ul></ul><p>I did some testing, and this branch looks good to me. Please merge.</p>
<p>For future reference I also experimented with replacing the</p>
<pre><code>defer runtime.GC()</code></pre>
<p>lines by</p>
<pre><code>defer debug.FreeOSMemory()</code></pre>
<p>which seems to lead to (much) lower but much more variable memory use. With 2 GET and 2 PUT clients accessing 2 different blocks with watch -n0.5, I saw a fairly stable RSS of 400M with this branch. When using FreeOSMemory instead, memory use was about half that on average, but much more variable (between 64M and 400M). CPU load seemed roughly 10% higher.</p> Arvados - Bug #2865: Reduce Keep server memory usehttps://dev.arvados.org/issues/2865?journal_id=110672014-06-04T11:30:06ZTim Piercetwp@curoverse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul>