Project

General

Custom queries

Watchers (1)

Profile

Actions

Bug #2865

closed

Reduce Keep server memory use

Added by Brett Smith almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tim Pierce
Category:
-
Story points:
2.0

Description

A job failed this morning when the Keep server had several block read errors at this time because it was out of memory. When the Python SDK couldn't get requested blocks from any Keep server, it translated that into a block not found exception. See the job log output. The input is fee29077095fed2e695100c299f11dc5+2727. Errors look like this:

2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr Traceback (most recent call last):
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/tmp/crunch-job/src/crunch_scripts/test/para/grep", line 18, in <module>
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     for line in input_file.readlines():
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 183, in readlines
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     for newdata in datasource:
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 155, in readall
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data = self.read(size)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 139, in read
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data += self._stream.readfrom(locator+segmentoffset, segmentsize)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 265, in readfrom
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data += self._keep.get(locator)[segmentoffset:segmentoffset+segmentsize]
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 305, in get
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     raise arvados.errors.NotFoundError("Block not found: %s" % expect_hash)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr arvados.errors.NotFoundError: Block not found: 43161251a3347a55e4a826daa730977f
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr srun: error: compute34: task 0: Exited with exit code 1

Subtasks 2 (0 open2 closed)

Task #2954: Rewrite Get and Put to minimize unnecessary allocationResolvedTim Pierce05/26/2014Actions
Task #2957: Review 2865-keep-memory-usageResolvedTim Pierce05/26/2014Actions

Added by Tim Pierce almost 11 years ago

Revision 2b2adb42 (diff)

2865: collect garbage after each GET and PUT.

Quick fix for Keep OOM errors: reclaim memory aggressively.
Fixes #2865.

Added by Tim Pierce almost 11 years ago

Revision 70dd308c (diff)

2865: reduce Keep memory usage.

Eliminate ioutil.ReadAll to reduce unnecessary 2x memory allocations.

  • PutBlockHandler allocates a buffer exactly as long as
    req.ContentLength and fills it with io.ReadFull.
  • GetBlock uses ioutil.ReadFile (which it arguably should have been
    doing in the first place).

Refs #2865.

Added by Tim Pierce almost 11 years ago

Revision 20ffc967 (diff)

2865: add traffic_test.py for testing Keep performance.

Refs #2865.

Actions

Also available in: Atom PDF