Bug #2865

Updated by Brett Smith over 6 years ago

A job failed this morning when the Keep server had several block read errors at this time because it was out of memory. When the Python SDK couldn't get requested blocks from any Keep server, it translated that into a block not found exception. See "the job log output":https://workbench.qr1hi.arvadosapi.com/collections/6a6d2a9287031e55321913c87b6afd2c+85/qr1hi-8i9sb-yf63mvltprdjwz7.log.txt?disposition=inline&size=25193. The input is fee29077095fed2e695100c299f11dc5+2727. Errors look like this:

<pre>
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr Traceback (most recent call last):
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr File "/tmp/crunch-job/src/crunch_scripts/test/para/grep", line 18, in <module>
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr for line in input_file.readlines():
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 183, in readlines
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr for newdata in datasource:
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 155, in readall
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr data = self.read(size)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 139, in read
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr data += self._stream.readfrom(locator+segmentoffset, segmentsize)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 265, in readfrom
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr data += self._keep.get(locator)[segmentoffset:segmentoffset+segmentsize]
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 305, in get
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr raise arvados.errors.NotFoundError("Block not found: %s" % expect_hash)
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr arvados.errors.NotFoundError: Block not found: 43161251a3347a55e4a826daa730977f
2014-05-26_15:31:38 qr1hi-8i9sb-yf63mvltprdjwz7 20503 6 stderr srun: error: compute34: task 0: Exited with exit code 1
</pre>

Back