Project

General

Profile

Actions

Bug #2865

closed

Reduce Keep server memory use

Added by Brett Smith almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tim Pierce
Category:
-
Story points:
2.0

Description

A job failed this morning when the Keep server had several block read errors at this time because it was out of memory. When the Python SDK couldn't get requested blocks from any Keep server, it translated that into a block not found exception. See the job log output. The input is fee29077095fed2e695100c299f11dc5+2727. Errors look like this:

2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr Traceback (most recent call last):
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/tmp/crunch-job/src/crunch_scripts/test/para/grep", line 18, in <module>
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     for line in input_file.readlines():
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 183, in readlines
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     for newdata in datasource:
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 155, in readall
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data = self.read(size)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 139, in read
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data += self._stream.readfrom(locator+segmentoffset, segmentsize)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 265, in readfrom
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     data += self._keep.get(locator)[segmentoffset:segmentoffset+segmentsize]
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 305, in get
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr     raise arvados.errors.NotFoundError("Block not found: %s" % expect_hash)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr arvados.errors.NotFoundError: Block not found: 43161251a3347a55e4a826daa730977f
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr srun: error: compute34: task 0: Exited with exit code 1

Subtasks 2 (0 open2 closed)

Task #2954: Rewrite Get and Put to minimize unnecessary allocationResolvedTim Pierce05/26/2014Actions
Task #2957: Review 2865-keep-memory-usageResolvedTim Pierce05/26/2014Actions
Actions

Also available in: Atom PDF