Bug #5856
closed
gzip.open(f) sometimes throws struct.error (unpack requires a string argument of length 4) on File handle from CollectionReader
Added by Sarah Guthrie almost 10 years ago.
Updated almost 10 years ago.
Description
One of the tasks from https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-pfu68urmcrq00mo failed with this error
Three tasks from https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-8sgdxqj8dq8tqxa failed with this error as well
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr Traceback (most recent call last):
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/tmp/crunch-job/src/crunch_scripts/add_callsets.py", line 234, in <module>
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr for line in callset_fastj_file_handle:
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 455, in readline
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr c = self.read(readsize)
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 261, in read
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr self._read(readsize)
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 325, in _read
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr self._read_eof()
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 344, in _read_eof
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr isize = read32(self.fileobj) # may exceed 2GB
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr File "/usr/lib/python2.7/gzip.py", line 25, in read32
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr return struct.unpack("<I", input.read(4))[0]
2015-04-29_17:40:17 su92l-8i9sb-bb7pxsb146b8suq 720 182 stderr struct.error: unpack requires a string argument of length 4
Log file: https://workbench.su92l.arvadosapi.com/collections/bb3292a7fb7b482be8dbfb75690f804f+133/su92l-8i9sb-bb7pxsb146b8suq.log.txt?disposition=inline&size=79308197
zcat-ing the file that broke the task from the first pipeline (using keepmount) did not produce any errors
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:c807aef7059b253512ba5da5d535e89e800b9c11.
- Status changed from Resolved to In Progress
- Target version changed from Bug Triage to 2015-05-20 sprint
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:4570521e9d639806b1f25c93822d5cf0dbfe42d0.
- Status changed from Resolved to In Progress
- Assigned To set to Peter Amstutz
Reviewing 4570521
I'm not sure it makes much sense for the tests to manipulate sfile.arvadosfile
directly. While it's not marked as private, it's not really part of the interface users are expected to use. What do you think of this:
- Rename
test_read_returns_first_block
to test_read_block_crossing_behavior
in both test case classes. In ArvadosFileReaderTestCase, this would have the code you added for test_read_crosses_blocks
. This way the name is sensible in both places and we don't have to try to erase one in the child.
test_successive_reads
is meant to ensure that state is maintained correctly across calls, so calling the stateless readfrom
misses the mark. What if it called sfile.read(4)
multiple times and checked for correct results until EOF? That would maintain the spirit of the test, and double-check that block crossing happens correctly.
In the docstring, "upto" should be two words.
Thanks.
- Status changed from In Progress to Resolved
- Assigned To deleted (
Peter Amstutz)
Brett Smith wrote:
Reviewing 4570521
I'm not sure it makes much sense for the tests to manipulate sfile.arvadosfile
directly. While it's not marked as private, it's not really part of the interface users are expected to use. What do you think of this:
- Rename
test_read_returns_first_block
to test_read_block_crossing_behavior
in both test case classes. In ArvadosFileReaderTestCase, this would have the code you added for test_read_crosses_blocks
. This way the name is sensible in both places and we don't have to try to erase one in the child.
test_successive_reads
is meant to ensure that state is maintained correctly across calls, so calling the stateless readfrom
misses the mark. What if it called sfile.read(4)
multiple times and checked for correct results until EOF? That would maintain the spirit of the test, and double-check that block crossing happens correctly.
Done.
In the docstring, "upto" should be two words.
Fixed.
- Assigned To set to Peter Amstutz
Also available in: Atom
PDF