Project

General

Profile

Actions

Bug #5856

closed

gzip.open(f) sometimes throws struct.error (unpack requires a string argument of length 4) on File handle from CollectionReader

Added by Sarah Guthrie about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

One of the tasks from https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-pfu68urmcrq00mo failed with this error
Three tasks from https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-8sgdxqj8dq8tqxa failed with this error as well

2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr Traceback (most recent call last):
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/tmp/crunch-job/src/crunch_scripts/add_callsets.py", line 234, in <module>
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr for line in callset_fastj_file_handle:
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 455, in readline
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr c = self.read(readsize)
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 261, in read
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr self._read(readsize)
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 325, in _read
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr self._read_eof()
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 344, in _read_eof
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr isize = read32(self.fileobj) # may exceed 2GB
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 25, in read32
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr return struct.unpack("<I", input.read(4))[0]
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr struct.error: unpack requires a string argument of length 4

Log file: https://workbench.su92l.arvadosapi.com/collections/bb3292a7fb7b482be8dbfb75690f804f+133/su92l-8i9sb-bb7pxsb146b8suq.log.txt?disposition=inline&size=79308197

zcat-ing the file that broke the task from the first pipeline (using keepmount) did not produce any errors


Subtasks 1 (0 open1 closed)

Task #5864: Review 5856-read-exactResolvedPeter Amstutz04/29/2015Actions
Actions #1

Updated by Peter Amstutz almost 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:c807aef7059b253512ba5da5d535e89e800b9c11.

Actions #2

Updated by Peter Amstutz almost 9 years ago

  • Status changed from Resolved to In Progress
Actions #3

Updated by Peter Amstutz almost 9 years ago

  • Target version changed from Bug Triage to 2015-05-20 sprint
Actions #4

Updated by Peter Amstutz almost 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:4570521e9d639806b1f25c93822d5cf0dbfe42d0.

Actions #5

Updated by Peter Amstutz almost 9 years ago

  • Status changed from Resolved to In Progress
  • Assigned To set to Peter Amstutz
Actions #6

Updated by Brett Smith almost 9 years ago

Reviewing 4570521

I'm not sure it makes much sense for the tests to manipulate sfile.arvadosfile directly. While it's not marked as private, it's not really part of the interface users are expected to use. What do you think of this:

  • Rename test_read_returns_first_block to test_read_block_crossing_behavior in both test case classes. In ArvadosFileReaderTestCase, this would have the code you added for test_read_crosses_blocks. This way the name is sensible in both places and we don't have to try to erase one in the child.
  • test_successive_reads is meant to ensure that state is maintained correctly across calls, so calling the stateless readfrom misses the mark. What if it called sfile.read(4) multiple times and checked for correct results until EOF? That would maintain the spirit of the test, and double-check that block crossing happens correctly.

In the docstring, "upto" should be two words.

Thanks.

Actions #7

Updated by Peter Amstutz almost 9 years ago

  • Status changed from In Progress to Resolved
  • Assigned To deleted (Peter Amstutz)

Brett Smith wrote:

Reviewing 4570521

I'm not sure it makes much sense for the tests to manipulate sfile.arvadosfile directly. While it's not marked as private, it's not really part of the interface users are expected to use. What do you think of this:

  • Rename test_read_returns_first_block to test_read_block_crossing_behavior in both test case classes. In ArvadosFileReaderTestCase, this would have the code you added for test_read_crosses_blocks. This way the name is sensible in both places and we don't have to try to erase one in the child.
  • test_successive_reads is meant to ensure that state is maintained correctly across calls, so calling the stateless readfrom misses the mark. What if it called sfile.read(4) multiple times and checked for correct results until EOF? That would maintain the spirit of the test, and double-check that block crossing happens correctly.

Done.

In the docstring, "upto" should be two words.

Fixed.

Actions #8

Updated by Brett Smith almost 9 years ago

  • Assigned To set to Peter Amstutz

c42ccd1f is good to merge. Thanks.

Actions

Also available in: Atom PDF