Project

General

Profile

Actions

Bug #5831

closed

[SDKs] CollectionReader.open(filename, 'r') fails (arvados/keep.py 665 in get_from_cache; slot is NoneType)

Added by Sarah Guthrie almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
High
Assigned To:
Category:
SDKs
Target version:
Story points:
-

Description

This bug appeared on job: https://workbench.su92l.arvadosapi.com/jobs/su92l-8i9sb-vu3ycb8jl6hm8x3
Pipeline instance: https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-zjxe3ss9n51ye6q

The priority is marked as high here since this pipeline needs to be able to complete to make progress on the paper.

As a further note, most files read by these tasks were gzipped, and therefore the fileobject passed in by CollectionReader.open(filename, 'r') was passed into gzip.GzipFile as the fileobj kwarg. All gzipped files were copied using arv copy from tb05z to su92l.

7/80 tasks failed. Each of these tasks read in 433 different gzipped files and 1 regular text file.

3 of these tasks failed on the first gzipped file. (tasks 22,50, and 68)
1 failed on the second gzipped file. (task 49)
1 failed on the 19th gzipped file. (task 67)
1 failed on the 193rd gzipped file. (task 69)
1 failed on the 358th gzipped file. (task 66)

The failure was identical in all 7 cases:

2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr Traceback (most recent call last):
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/tmp/crunch-job/src/crunch_scripts/add_callsets.py", line 230, in <module>
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr for line in callset_fastj_file_handle:
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/lib/python2.7/gzip.py", line 455, in readline
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr c = self.read(readsize)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/lib/python2.7/gzip.py", line 261, in read
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr self._read(readsize)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/lib/python2.7/gzip.py", line 301, in _read
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr buf = self.fileobj.read(size)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr return orig_func(self, *args, **kwargs)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 157, in num_retries_setter
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr return orig_func(self, *args, **kwargs)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 871, in read
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr data = self.arvadosfile.readfrom(self._filepos, size, num_retries)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 732, in readfrom
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr block = self.parent._my_block_manager().get_block_contents(lr.locator, num_retries=num_retries, cache_only=bool(data))
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 511, in get_block_contents
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr return self._keep.get_from_cache(locator)
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 665, in get_from_cache
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr if slot.ready.is_set():
2015-04-27_17:24:44 su92l-8i9sb-vu3ycb8jl6hm8x3 24834 22 stderr AttributeError: 'NoneType' object has no attribute 'ready'

The log file for the pipeline is at: https://workbench.su92l.arvadosapi.com/collections/034b34e952e219150c44363f6ed1e23b+91/su92l-8i9sb-vu3ycb8jl6hm8x3.log.txt

Actions #1

Updated by Sarah Guthrie almost 9 years ago

Same bug on different pipeline instance on different data: https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-o1lftbvonzr9cif

Interestingly, the same problem occurred on data also copied from tb05z to su92l. The jobs didn't run to completion, but in the first 7 minutes, 4 tasks failed:

Tasks 22,50, and 68 failed while reading in the first gzipped file
Task 69 failed on the 5th gzipped file.

These task numbers might be random, but they are a distinct subset of the task numbers seen in the first bug report.

Actions #2

Updated by Radhika Chippada almost 9 years ago

  • Category set to SDKs
  • Target version set to Bug Triage
Actions #3

Updated by Sarah Guthrie almost 9 years ago

When running on just one node, task 22 failed from the same error:
https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-tcdh7tuxh2cfr1z

Actions #4

Updated by Radhika Chippada almost 9 years ago

  • Status changed from New to In Progress
  • Assigned To set to Peter Amstutz
  • Target version changed from Bug Triage to 2015-04-29 sprint
Actions #5

Updated by Radhika Chippada almost 9 years ago

  • Status changed from In Progress to Resolved
Actions #6

Updated by Radhika Chippada almost 9 years ago

  • Subject changed from CollectionReader.open(filename, 'r') fails (arvados/keep.py 665 in get_from_cache; slot is NoneType) to [SDKs] CollectionReader.open(filename, 'r') fails (arvados/keep.py 665 in get_from_cache; slot is NoneType)
Actions #7

Updated by Sarah Guthrie almost 9 years ago

This bug appeared again: https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-29d3lxak8gdlo46

Nevermind, I needed to update my SDK

Actions

Also available in: Atom PDF