Bug #6276

[SDKs] Python Keep block-r/w errors should indicate which collection/file was unreadable/unwritable as a result

Added by Bryan Cosca almost 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
Start date:
06/10/2015
Due date:
% Done:

0%

Estimated time:
Story points:
1.0

Description

summmary

Errors like "block not found" provide a block locator, but it's generally not obvious why a process was trying to read that block: "Which of the 7 collections used in my job has a missing block?"

Ideally, the error message would tell you:
  • collection portable_data_hash
  • collection uuid and name
  • filename
  • (application-dependent) name of the collection (e.g., script_parameter in a crunch script)

The minimum information would be the collection's portable_data_hash, and the uuid if that's how we looked up the source collection in the first place. This is the most annoying part for the user to do manually, and in most cases it's enough information to get to the next step in troubleshooting/resolving the problem.

implementation

When calling get/put-block methods, catch the relevant exceptions and raise a context-aware exception instead (incorporating the text of the original exception).

background

my command:
['/tmp/crunch-job-work/bwa/bwa-0.7.5a/bwa', 'mem', '-t', '16', '-c', '100', '-M', '-R', '@RG\tID:RG_ID\tSM:RG_SM\tPL:RG_PL\tLB:RG_LB\tPU:RG_PU', u'/keep/3514b8e5da0e8d109946bc809b20a78a+5698/human_g1k_v37.fasta', u'/keep/3eee80c078a748a950a818157af6a172+1119/xaa.fastq.gz']

log message:
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr 2015-06-10 14:51:50 arvados.arvados_fuse976 WARNING: Block not found: 984fbe6728b638cfb584f20e14026b3d+67108864+A251fa8a621e86009cc976a1f6af214b7e61d2fe6@558ac402 not found: http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr ; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found\015
2015-06-10_14:51:50 qr1hi-8i9sb-i52ckct0rg38xx6 3798 1 stderr

https://cloud.curoverse.com/collections/2fc686a85146e32cb0c6a707d43add28+85/qr1hi-8i9sb-i52ckct0rg38xx6.log.txt?disposition=inline&size=32378


Related issues

Related to Arvados - Bug #6303: missing block 2Resolved06/10/2015

History

#1 Updated by Bryan Cosca almost 4 years ago

  • Description updated (diff)

#2 Updated by Bryan Cosca almost 4 years ago

It was this collection 3514b8e5da0e8d109946bc809b20a78a+5698

arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
Traceback (most recent call last):
File "/usr/local/bin/arv-get", line 198, in <module>
for data in f.readall():
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 94, in readall
data = self.read(size, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 201, in read
num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 85, in readfrom
data.append(self._keepget(lr.locator, num_retries=num_retries)[lr.segment_offset:lr.segment_offset+lr.segment_size])
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 74, in _keepget
return self._keep.get(locator, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
"{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 7b7d00d58de30eec31faf02df51ab824+67108864+A6baeb8a3e9982b45d80acc29ef546d7caa17599c@558ac91a not found: http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found

#4 Updated by Bryan Cosca almost 4 years ago

arv-copy did not help me here:

arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
Traceback (most recent call last):
File "/usr/local/bin/arv-get", line 198, in <module>
for data in f.readall():
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 94, in readall
data = self.read(size, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 201, in read
num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 85, in readfrom
data.append(self._keepget(lr.locator, num_retries=num_retries)[lr.segment_offset:lr.segment_offset+lr.segment_size])
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 74, in _keepget
return self._keep.get(locator, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
"{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 7b7d00d58de30eec31faf02df51ab824+67108864+A6baeb8a3e9982b45d80acc29ef546d7caa17599c@558ac91a not found: http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found

:/data/2/temp22$ arv-copy
usage: arv-copy [-h] [-v] [--progress] [--no-progress] [-f] --src
SOURCE_ARVADOS --dst DESTINATION_ARVADOS [--recursive]
[--no-recursive] [--dst-git-repo DST_GIT_REPO]
[--project-uuid PROJECT_UUID] [--retries RETRIES]
object_uuid
arv-copy: error: too few arguments
:/data/2/temp22$ arv-copy --src su92l --dst qr1hi su92l-4zz18-kk9vpg66od6cmua
2015-06-10 15:18:28 arvados.arv-copy31764 INFO:
2015-06-10 15:18:28 arvados.arv-copy31764 INFO: Success: created copy with uuid qr1hi-4zz18-451d1et0gyddljt
:/data/2/temp22$ arv pipeline run --template=qr1hi-p5p6p-9nx4nes4emp6iui Run-BWA::input=a5235659558206c6df35eeae033d6864+1036 --submit
qr1hi-d1hrv-jp1ju7ezma7yuzh
:/data/2/temp22$ arv ws -j qr1hi-d1hrv-jp1ju7ezma7yuzh
^:/data/2/temp22$ arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
arv-get: Local file ./human_g1k_v37.dict already exists.
:/data/2/temp22$ ls
human_g1k_v37.dict
:/data/2/temp22$ rm human_g1k_v37.dict
:/data/2/temp22$ arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
Traceback (most recent call last):
File "/usr/local/bin/arv-get", line 198, in <module>
for data in f.readall():
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 94, in readall
data = self.read(size, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 45, in before_close_wrapper
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/arvfile.py", line 201, in read
num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 85, in readfrom
data.append(self._keepget(lr.locator, num_retries=num_retries)[lr.segment_offset:lr.segment_offset+lr.segment_size])
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 74, in _keepget
return self._keep.get(locator, num_retries=num_retries)
File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
return orig_func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
"{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 7b7d00d58de30eec31faf02df51ab824+67108864+Aa1d1a22cf70b8ed7de6dc81f20ff1fb955bbabd9@558acaf0 not found: http://keep5.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep0.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep6.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep7.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep1.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep3.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep2.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found
; http://keep4.qr1hi.arvadosapi.com:25107/ responded with 404 HTTP/1.1 404 Not Found

#5 Updated by Bryan Cosca almost 4 years ago

forced an arv-copy and it works

arv-copy -f --src su92l --dst qr1hi su92l-4zz18-kk9vpg66od6cmua
su92l-4zz18-kk9vpg66od6cmua: 8184M / 8184M 100.0%
2015-06-10 15:42:31 arvados.arv-copy2346 INFO:
2015-06-10 15:42:31 arvados.arv-copy2346 INFO: Success: created copy with uuid qr1hi-4zz18-xorzttm0793uflw
:~$ arv-get 3514b8e5da0e8d109946bc809b20a78a+5698/ .
63 MiB / 8184 MiB 0.8%

#7 Updated by Ward Vandewege almost 4 years ago

  • Status changed from New to Resolved
  • Target version set to 2015-07-08 sprint

My fault, some volumes were not mounted properly on keep0. Fixed now.

#8 Updated by Ward Vandewege almost 4 years ago

  • Status changed from Resolved to New

#9 Updated by Ward Vandewege almost 4 years ago

  • Target version changed from 2015-07-08 sprint to Bug Triage

#10 Updated by Tom Clegg almost 4 years ago

  • Subject changed from Keep should tell you what file the block is missing from to [SDKs] Python Keep block-r/w errors should indicate which collection/file was unreadable/unwritable as a result

#11 Updated by Tom Clegg almost 4 years ago

  • Description updated (diff)

#12 Updated by Tom Clegg almost 4 years ago

  • Category set to SDKs
  • Story points set to 1.0

#13 Updated by Joshua Randall almost 4 years ago

I may be having this issue too. An arv-copy is failing (from qr1hi) but the error it gives me is hard to know what to do with:

# arv-copy --src qr1hi --dst 7lnae --project-uuid 7lnae-j7d0g-9r429yt0rcrdl8x 2e98fdc8e90f4c48a0714b711767c9ce+76
2e98fdc8e90f4c48a0714b711767c9ce+76: 0M / 11M 0.0% Traceback (most recent call last):
  File "/usr/local/bin/arv-copy", line 4, in <module>
    main()
  File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 119, in main
    args)
  File "/usr/local/lib/python2.7/dist-packages/arvados/commands/arv_copy.py", line 552, in copy_collection
    data = src_keep.get(word)
  File "/usr/local/lib/python2.7/dist-packages/arvados/retry.py", line 154, in num_retries_setter
    return orig_func(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 904, in get
    "{} not found".format(loc_s), service_errors)
arvados.errors.NotFoundError: 3163cbeef8fd50d8cb85096758b801a3+12404311+Aa00e86837a2af04bd830acce3eb64e8443722c98@55c3220f not found:  https://keep.qr1hi.arvadosapi.com:443/ responded with 404 HTTP/1.1 404 Not Found

It would be helpful in particular if the 404 error message included the full URL that returned that status, so that I can check whether it is accessible from elsewhere (in case it could be due to proxy issues). In this case I don't understand what the issue is because I can access this collection and the data within it just fine using the workbench on qr1hi - it looks like the workbench accesses keep objects via its own proxy whereas arv-copy accesses them directly from the keep server.

I note that arv-get already returns more helpful error messages:

# arv-get 3163cbeef8fd50d8cb85096758b801a3+12404311+A65cce716e43a5865b03a66835702ed4f04ebbcd2@55c32396 /tmp/GenomeAnalysisTK.jar
Traceback (most recent call last):
  File "/usr/local/bin/arv-get", line 129, in <module>
    reader = arvados.CollectionReader(collection, num_retries=args.retries)
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 1616, in __init__
    super(CollectionReader, self).__init__(manifest_locator_or_text, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 1188, in __init__
    self._populate()
  File "/usr/local/lib/python2.7/dist-packages/arvados/collection.py", line 1306, in _populate
    error_via_keep))
arvados.errors.NotFoundError: Failed to retrieve collection '3163cbeef8fd50d8cb85096758b801a3+12404311+A65cce716e43a5865b03a66835702ed4f04ebbcd2@55c32396' from either API server (<HttpError 404 when requesting https://qr1hi.arvadosapi.com/arvados/v1/collections/3163cbeef8fd50d8cb85096758b801a3%2B12404311%2BA65cce716e43a5865b03a66835702ed4f04ebbcd2%4055c32396?alt=json returned "Path not found">) or Keep (3163cbeef8fd50d8cb85096758b801a3+12404311+A65cce716e43a5865b03a66835702ed4f04ebbcd2@55c32396 not found:  https://keep.qr1hi.arvadosapi.com:443/ responded with 404 HTTP/1.1 404 Not Found
).

Although in this case I still don't understand why workbench is able to get the data while neither the API server nor keep can find it ("https://workbench.qr1hi.arvadosapi.com/collections/2e98fdc8e90f4c48a0714b711767c9ce+76/GenomeAnalysisTK.jar?disposition=attachment&size=12404311" works fine).

#14 Updated by Joshua Randall almost 4 years ago

Oh, actually the workbench URL doesn't work fine - it appears to download the file but it is actually a 0 byte file.

#15 Updated by Brett Smith almost 4 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

Also available in: Atom PDF