Bug #10669
closedgoogleapiclient.errors.InvalidJsonError retrieving discovery document
Description
When logging in to a shell node, I got the error below, apparently because the cached version of the discovery document was corrupted. Deleting the contents of ~/.cache/discovery/ fixed the issue, but I think there are two things that need to be fixed:
1. Make sure that the file gets truncated to the new length if a shorter discovery document is downloaded. The fact that the corrupted version of the file ends with:
,"websocketUrl":"wss://ws.e51c5.arvadosapi.com/websocket"}om/web
makes me suspicious of the garbage at the end of the file.
2. If there's an error parsing the cached file, it should be deleted and re-downloaded to see if the problem can be cleared
No handlers could be found for logger "googleapiclient.discovery"
Traceback (most recent call last):
File "/usr/local/arvados/record-login.py", line 9, in <module>
arv = arvados.api('v1')
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 185, in api
return api_from_config(version=version, cache=cache, **kwargs)
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 244, in api_from_config
return api(version=version, host=host, token=token, insecure=insecure, **kwargs)
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 208, in api
svc = apiclient_discovery.build('arvados', version, **kwargs)
File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 196, in build
tfmorris@shell:~$ cache)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 256, in _retrieve_discovery_doc
raise InvalidJsonError()
googleapiclient.errors.InvalidJsonError
Files
Updated by Tom Morris about 8 years ago
- File corrupt.json corrupt.json added
- Description updated (diff)
- Assigned To set to Tom Morris
- Target version set to Arvados Future Sprints
Updated by Tom Morris about 8 years ago
- Target version changed from Arvados Future Sprints to 2017-01-04 sprint
Updated by Tom Morris about 8 years ago
- Target version deleted (
2017-01-04 sprint)
Updated by Tom Clegg almost 8 years ago
Discovery doc retrieval is done by google-api-client. We ask it to use httplib2's built-in cache mechanism.
Unfortunately that built-in cache is "Not really safe to use if multiple threads or processes are going to be running on the same cache."
For example:
def set(self, key, value):
cacheFullPath = os.path.join(self.cache, self.safe(key))
f = open(cacheFullPath, "wb")
f.write(value)
f.close()
https://github.com/jcgregorio/httplib2/blob/master/python3/httplib2/__init__.py#L666-L697
Fortunately it looks like we can make an object that supports the same interface as FileCache, and pass that to httplib2.Http() instead of "~/.cache/arvados/discovery".
Updated by Tom Morris almost 8 years ago
- Target version set to 2017-04-12 sprint
Updated by Tom Morris almost 8 years ago
- Target version changed from 2017-04-12 sprint to 2017-03-29 sprint
Updated by Tom Clegg almost 8 years ago
- Category set to SDKs
- Status changed from New to In Progress
Updated by Tom Clegg almost 8 years ago
10669-safe-http-cache @ 514cd364c3cb27b633c1368cd06d6a54927c98a8
This branch uses ~/.cache/arvados/discovery/{md5}.tmp instead of ~/.cache/arvados/discovery/{md5} which has two benefits:- it ignores old corrupt files left behind by the default httplib2 implementation
- it can clean up its own mess by removing *.tmp with much less risk of deleting valuable stuff (even if it somehow ends up using a useful directory as a cache directory)
Updated by Peter Amstutz almost 8 years ago
Did you try the multithreaded test_cache_crud against the regular httplib2 cache? Does the (unsafe) cache fail the test regularly, where the new cache passes?
Updated by Peter Amstutz almost 8 years ago
For what its worth, from my cache directory:
Old filename:
qr1hi.arvadosapi.com,discovery,v1,apis,arvados,v1,rest,86e60ccc4da91c726aa80dfb73fa9cbf
New filename:
86e60ccc4da91c726aa80dfb73fa9cbf.tmp
Not really sure of the original purpose/intent of having the filename contain both a munged URL and the hash of the URL (except maybe to make debugging easier).
Content is the same except timestamps in the headers.
Updated by Tom Clegg almost 8 years ago
Ah yes, older versions of httplib2 used just the md5 but current versions use a munged url plus md5. And yeah, I assume the reason is for debugging, so you don't have to grep for ^content-location:
.
Updated by Tom Clegg almost 8 years ago
Peter Amstutz wrote:
Did you try the multithreaded test_cache_crud against the regular httplib2 cache? Does the (unsafe) cache fail the test regularly, where the new cache passes?
(I hadn't before, but) I tried it, and yes, it fails horribly. Fixed up the test case so it prints a bunch of errors and then fails an assertion, instead of dumping a lot of stack traces.
10669-safe-http-cache @ c56743e301b49163a56482c13e49a01c9a0fd7dc
Updated by Tom Clegg almost 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:1667f9860de21d29bbe32bb827db29eca62d9aeb.