Bug #2865

Reduce Keep server memory use

Added by Brett Smith over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tim Pierce
Category:
-
Start date:
05/26/2014
Due date:
% Done:

100%

Estimated time:
(Total: 9.00 h)
Story points:
2.0

Description

A job failed this morning when the Keep server had several block read errors at this time because it was out of memory. When the Python SDK couldn't get requested blocks from any Keep server, it translated that into a block not found exception. See the job log output. The input is fee29077095fed2e695100c299f11dc5+2727. Errors look like this:

2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr Traceback (most recent call last):
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/tmp/crunch-job/src/crunch_scripts/test/para/grep", line 18, in <module>
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     for line in input_file.readlines():
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 183, in readlines
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     for newdata in datasource:
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 155, in readall
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data = self.read(size)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 139, in read
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data += self._stream.readfrom(locator+segmentoffset, segmentsize)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 265, in readfrom
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data += self._keep.get(locator)[segmentoffset:segmentoffset+segmentsize]
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 305, in get
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     raise arvados.errors.NotFoundError("Block not found: %s" % expect_hash)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr arvados.errors.NotFoundError: Block not found: 43161251a3347a55e4a826daa730977f
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr srun: error: compute34: task 0: Exited with exit code 1

Subtasks

Task #2954: Rewrite Get and Put to minimize unnecessary allocationResolvedTim Pierce

Task #2957: Review 2865-keep-memory-usageResolvedTim Pierce

Associated revisions

Revision 2b2adb42 (diff)
Added by Tim Pierce over 6 years ago

2865: collect garbage after each GET and PUT.

Quick fix for Keep OOM errors: reclaim memory aggressively.
Fixes #2865.

Revision 70dd308c (diff)
Added by Tim Pierce over 6 years ago

2865: reduce Keep memory usage.

Eliminate ioutil.ReadAll to reduce unnecessary 2x memory allocations.

  • PutBlockHandler allocates a buffer exactly as long as
    req.ContentLength and fills it with io.ReadFull.
  • GetBlock uses ioutil.ReadFile (which it arguably should have been
    doing in the first place).

Refs #2865.

Revision 20ffc967 (diff)
Added by Tim Pierce over 6 years ago

2865: add traffic_test.py for testing Keep performance.

Refs #2865.

History

#1 Updated by Brett Smith over 6 years ago

I have been unable to reproduce this so far. Running on compute34.qr1hi, using the same API token as the original job, I can run this in a Python interpreter, which I believe is functionally equivalent to what the grep script does:

import arvados
cr = arvados.CollectionReader('fee29077095fed2e695100c299f11dc5+2727')
for s in cr.all_streams():
    for f in s.all_files():
        cr = arvados.CollectionReader(f.as_manifest())
        for crfile in cr.all_files():
                for line in crfile.readlines():
                        pass

This runs to completion without raising an exception.

#2 Updated by Brett Smith over 6 years ago

One weird discrepancy I can't figure out is that the line that raises this exception doesn't match what's in the backtrace (305):

compute34.qr1hi:~$ grep -nFC3 "Block not found" /usr/local/lib/python2.7/dist-packages/arvados/keep.py
300-
301-        slot.set(None)
302-        self.cap_cache()
303:        raise arvados.errors.NotFoundError("Block not found: %s" % expect_hash)
304-
305-    def get_url(self, url, headers, expect_hash):
306-        h = httplib2.Http()

#3 Updated by Brett Smith over 6 years ago

  • Subject changed from grep job on qr1hi fails to find Keep blocks to Reduce Keep server memory use
  • Description updated (diff)

Reviewing the logs indicates that the Keep server had several block read errors at this time because it was out of memory. When the Python SDK couldn't get requested blocks from any Keep server, it translated that into the Block not found exception.

Abram mentioned in IRC that he had garbage collection issues working with Go in the past, and found it helpful to make explicit collection requests when appropriate. Ward adds:

cure: imo keep should agressively free memory after each block has been served
cure: because the kernel will cache it anyway in the disk cache, making subsequent requests fast
cure: (maybe it already does, I haven't looked)

#4 Updated by Tim Pierce over 6 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tim Pierce
  • Target version set to 2014-05-28 Pipeline Factory

#5 Updated by Tim Pierce over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:2b2adb421b9b82b75fe8a635442dfe8e1fab775a.

#6 Updated by Ward Vandewege over 6 years ago

  • Status changed from Resolved to In Progress

#7 Updated by Ward Vandewege over 6 years ago

  • Target version changed from 2014-05-28 Pipeline Factory to 2014-06-17 Curating and Crunch

#8 Updated by Ward Vandewege over 6 years ago

  • Story points set to 2.0

#9 Updated by Tim Pierce over 6 years ago

To bring this up to date, here is what we have found. Testing consists of storing ~3GB of data in Keep once, and then running repeated GET requests on random blocks, with one or more concurrent clients on the same machine.

  • Base memory footprint for Keep when idle is ~5MB.
  • After issuing the necessary PUT requests to Keep, the server's memory footprint is 260MB (256MB higher than base).
  • With one client running serial GET requests, server memory footprint rises to 516MB (256MB higher than post-PUT).
  • Each additional concurrent client adds roughly 128MB to server footprint
  • After clients terminate, memory returns to baseline levels after 5-10 minutes.

Curiously, Go's own profiling tools do not account for the RSS reported by the kernel.

Peter has observed that ioutil.ReadAll grows its buffer exponentially, so when reading 64MB of data over the network, it will allocate a 128MB buffer. That appears likely to be partly responsible for bloating memory usage.

Next steps:

  • Rewrite to avoid ioutil.ReadAll:
    • Rewrite main.PutBlock to stream data from the network directly to a temporary file instead of an in-memory buffer, checksumming as it goes.
    • Possibly: rewrite main.GetBlock to return a block even if the checksum does not match, and simply log an error if it does. Doing so will allow us to handle GET with a relatively small buffer. It is reasonable to expect the client to perform its own checksum anyway, so it is not necessarily unwise for Keep to return the source data.
  • (lower) Understand the discrepancy between "go tool pprof" and the RSS reported by top. My assumption is that this is an artifact of Go's allocator hanging on to this memory so that the next call to make will not require it to go back to malloc(3) but

#10 Updated by Ward Vandewege over 6 years ago

I did some testing, and this branch looks good to me. Please merge.

For future reference I also experimented with replacing the

defer runtime.GC()

lines by

defer debug.FreeOSMemory()

which seems to lead to (much) lower but much more variable memory use. With 2 GET and 2 PUT clients accessing 2 different blocks with watch -n0.5, I saw a fairly stable RSS of 400M with this branch. When using FreeOSMemory instead, memory use was about half that on average, but much more variable (between 64M and 400M). CPU load seemed roughly 10% higher.

#11 Updated by Tim Pierce over 6 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF