Bug #6276
open[SDKs] Python Keep block-r/w errors should indicate which collection/file was unreadable/unwritable as a result
Description
summmary¶
Errors like "block not found" provide a block locator, but it's generally not obvious why a process was trying to read that block: "Which of the 7 collections used in my job has a missing block?"
Ideally, the error message would tell you:- collection portable_data_hash
- collection uuid and name
- filename
- (application-dependent) name of the collection (e.g., script_parameter in a crunch script)
The minimum information would be the collection's portable_data_hash, and the uuid if that's how we looked up the source collection in the first place. This is the most annoying part for the user to do manually, and in most cases it's enough information to get to the next step in troubleshooting/resolving the problem.
implementation¶
When calling get/put-block methods, catch the relevant exceptions and raise a context-aware exception instead (incorporating the text of the original exception).
background¶
my command:
['/tmp/crunch-job-work/bwa/bwa-0.7.5a/bwa', 'mem', '-t', '16', '-c', '100', '-M', '-R', '@RG\tID:RG_ID\tSM:RG_SM\tPL:RG_PL\tLB:RG_LB\tPU:RG_PU', u'/keep/3514b8e5da0e8d109946bc809b20a78a+5698/human_g1k_v37.fasta', u'/keep/3eee80c078a748a950a818157af6a172+1119/xaa.fastq.gz']
log message:
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr 2015-06-10 14:51:50 arvados.arvados_fuse976 WARNING: Block not found: 984fbe6728b638cfb584f20e14026b3d+67108864+A251fa8a621e86009cc976a1f6af214b7e61d2fe6@558ac402 not found: http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr
Updated by Bryan Cosca over 9 years ago
It was this collection 3514b8e5da0e8d109946bc809b20a78a+5698
arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
Traceback (most recent call last):
File "/usr/local/bin/arv-get", line 198, in <module>
for data in f.readall():
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 94, in readall
data = self.read(size, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 201, in read
num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 85, in readfrom
data.append(self._keepget(lr.locator, num_retries=num_retries)[lr.segment_offset:lr.segment_offset+lr.segment_size])
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 74, in _keepget
return self._keep.get(locator, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
"{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 7b7d00d58de30eec31faf02df51ab824+67108864+A6baeb8a3e9982b45d80acc29ef546d7caa17599c@558ac91a not found: http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
Updated by Bryan Cosca over 9 years ago
I successfully used it here: https://cloud.curoverse.com/jobs/qr1hi-8i9sb-a5pa7ekx6wpavpp on 6/5.
Updated by Bryan Cosca over 9 years ago
arv-copy did not help me here:
arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
Traceback (most recent call last):
File "/usr/local/bin/arv-get", line 198, in <module>
for data in f.readall():
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 94, in readall
data = self.read(size, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 201, in read
num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 85, in readfrom
data.append(self._keepget(lr.locator, num_retries=num_retries)[lr.segment_offset:lr.segment_offset+lr.segment_size])
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 74, in _keepget
return self._keep.get(locator, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
"{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 7b7d00d58de30eec31faf02df51ab824+67108864+A6baeb8a3e9982b45d80acc29ef546d7caa17599c@558ac91a not found: http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
1!bcosc@bcosc.qr1hi:/data/2/temp22$ arv-copy
usage: arv-copy [-h] [-v] [--progress] [--no-progress] [-f] --src
SOURCE_ARVADOS --dst DESTINATION_ARVADOS [--recursive]
[--no-recursive] [--dst-git-repo DST_GIT_REPO]
[--project-uuid PROJECT_UUID] [--retries RETRIES]
object_uuid
arv-copy: error: too few arguments
2!bcosc@bcosc.qr1hi:/data/2/temp22$ arv-copy --src su92l --dst qr1hi su92l-4zz18-kk9vpg66od6cmua
2015-06-10 15:18:28 arvados.arv-copy31764 INFO:
2015-06-10 15:18:28 arvados.arv-copy31764 INFO: Success: created copy with uuid qr1hi-4zz18-451d1et0gyddljt
bcosc@bcosc.qr1hi:/data/2/temp22$ arv pipeline run --template=qr1hi-p5p6p-9nx4nes4emp6iui Run-BWA::input=a5235659558206c6df35eeae033d6864+1036 --submit
qr1hi-d1hrv-jp1ju7ezma7yuzh
bcosc@bcosc.qr1hi:/data/2/temp22$ arv ws -j qr1hi-d1hrv-jp1ju7ezma7yuzh
^Cbcosc@bcosc.qr1hi:/data/2/temp22$ arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
arv-get: Local file ./human_g1k_v37.dict already exists.
1!bcosc@bcosc.qr1hi:/data/2/temp22$ ls
human_g1k_v37.dict
bcosc@bcosc.qr1hi:/data/2/temp22$ rm human_g1k_v37.dict
bcosc@bcosc.qr1hi:/data/2/temp22$ arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
Traceback (most recent call last):
File "/usr/local/bin/arv-get", line 198, in <module>
for data in f.readall():
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 94, in readall
data = self.read(size, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 201, in read
num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 85, in readfrom
data.append(self._keepget(lr.locator, num_retries=num_retries)[lr.segment_offset:lr.segment_offset+lr.segment_size])
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 74, in _keepget
return self._keep.get(locator, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
"{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 7b7d00d58de30eec31faf02df51ab824+67108864+Aa1d1a22cf70b8ed7de6dc81f20ff1fb955bbabd9@558acaf0 not found: http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
Updated by Bryan Cosca over 9 years ago
forced an arv-copy and it works
arv-copy -f --src su92l --dst qr1hi su92l-4zz18-kk9vpg66od6cmua
su92l-4zz18-kk9vpg66od6cmua: 8184M / 8184M 100.0%
2015-06-10 15:42:31 arvados.arv-copy2346 INFO:
2015-06-10 15:42:31 arvados.arv-copy2346 INFO: Success: created copy with uuid qr1hi-4zz18-xorzttm0793uflw
bcosc@bcosc.qr1hi:~$ arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
63 MiB / 8184 MiB 0.8%
Updated by Bryan Cosca over 9 years ago
It looks like the job still can't find that block...
Updated by Ward Vandewege over 9 years ago
- Status changed from New to Resolved
- Target version set to 2015-07-08 sprint
My fault, some volumes were not mounted properly on keep0. Fixed now.
Updated by Ward Vandewege over 9 years ago
- Target version changed from 2015-07-08 sprint to Bug Triage
Updated by Tom Clegg over 9 years ago
- Subject changed from Keep should tell you what file the block is missing from to [SDKs] Python Keep block-r/w errors should indicate which collection/file was unreadable/unwritable as a result
Updated by Tom Clegg over 9 years ago
- Category set to SDKs
- Story points set to 1.0
Updated by Joshua Randall over 9 years ago
I may be having this issue too. An arv-copy is failing (from qr1hi) but the error it gives me is hard to know what to do with:
# arv-copy --src qr1hi --dst 7lnae --project-uuid 7lnae-j7d0g-9r429yt0rcrdl8x 2e98fdc8e90f4c48a0714b711767c9ce+76 2e98fdc8e90f4c48a0714b711767c9ce+76: 0M / 11M 0.0% Traceback (most recent call last): File "/usr/local/bin/arv-copy", line 4, in <module> main() File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 119, in main args) File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 552, in copy_collection data = src_keep.get(word) File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter return orig_func(self, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get "{} not found".format(loc_s), service_errors) arvados.errors.NotFoundError: 3163cbeef8fd50d8cb85096758b801a3+12404311+Aa00e86837a2af04bd830acce3eb64e8443722c98@55c3220f not found: https://keep.qr1hi.arvadosapi.com:443/ responded with 404 HTTP/1.1 404 Not Found
It would be helpful in particular if the 404 error message included the full URL that returned that status, so that I can check whether it is accessible from elsewhere (in case it could be due to proxy issues). In this case I don't understand what the issue is because I can access this collection and the data within it just fine using the workbench on qr1hi - it looks like the workbench accesses keep objects via its own proxy whereas arv-copy accesses them directly from the keep server.
I note that arv-get already returns more helpful error messages:
# arv-get 3163cbeef8fd50d8cb85096758b801a3+12404311+A65cce716e43a5865b03a66835702ed4f04ebbcd2@55c32396 /tmp/GenomeAnalysisTK.jar Traceback (most recent call last): File "/usr/local/bin/arv-get", line 129, in <module> reader = arvados.CollectionReader(collection, num_retries=args.retries) File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 1616, in __init__ super(CollectionReader, self).__init__(manifest_locator_or_text, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 1188, in __init__ self._populate() File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 1306, in _populate error_via_keep)) arvados.errors.NotFoundError: Failed to retrieve collection '3163cbeef8fd50d8cb85096758b801a3+12404311+A65cce716e43a5865b03a66835702ed4f04ebbcd2@55c32396' from either API server (<HttpError 404 when requesting https://qr1hi.arvadosapi.com/arvados/v1/collections/3163cbeef8fd50d8cb85096758b801a3%2B12404311%2BA65cce716e43a5865b03a66835702ed4f04ebbcd2%4055c32396?alt=json returned "Path not found">) or Keep (3163cbeef8fd50d8cb85096758b801a3+12404311+A65cce716e43a5865b03a66835702ed4f04ebbcd2@55c32396 not found: https://keep.qr1hi.arvadosapi.com:443/ responded with 404 HTTP/1.1 404 Not Found ).
Although in this case I still don't understand why workbench is able to get the data while neither the API server nor keep can find it ("https://workbench.qr1hi.arvadosapi.com/collections/2e98fdc8e90f4c48a0714b711767c9ce+76/GenomeAnalysisTK.jar?disposition=attachment&size=12404311" works fine).
Updated by Joshua Randall over 9 years ago
Oh, actually the workbench URL doesn't work fine - it appears to download the file but it is actually a 0 byte file.
Updated by Brett Smith over 9 years ago
- Target version changed from Bug Triage to Arvados Future Sprints
Updated by Ward Vandewege over 3 years ago
- Target version deleted (
Arvados Future Sprints)