Project

General

Profile

Actions

Bug #10669

closed

googleapiclient.errors.InvalidJsonError retrieving discovery document

Added by Tom Morris about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
1.0

Description

When logging in to a shell node, I got the error below, apparently because the cached version of the discovery document was corrupted. Deleting the contents of ~/.cache/discovery/ fixed the issue, but I think there are two things that need to be fixed:

1. Make sure that the file gets truncated to the new length if a shorter discovery document is downloaded. The fact that the corrupted version of the file ends with:
,"websocketUrl":"wss://ws.e51c5.arvadosapi.com/websocket"}om/web
makes me suspicious of the garbage at the end of the file.

2. If there's an error parsing the cached file, it should be deleted and re-downloaded to see if the problem can be cleared

No handlers could be found for logger "googleapiclient.discovery"
Traceback (most recent call last):
File "/usr/local/arvados/record-login.py", line 9, in <module>
arv = arvados.api('v1')
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 185, in api
return api_from_config(version=version, cache=cache, **kwargs)
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 244, in api_from_config
return api(version=version, host=host, token=token, insecure=insecure, **kwargs)
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 208, in api
svc = apiclient_discovery.build('arvados', version, **kwargs)
File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 196, in build
tfmorris@shell:~$ cache)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 256, in _retrieve_discovery_doc
raise InvalidJsonError()
googleapiclient.errors.InvalidJsonError


Files

corrupt.json (160 KB) corrupt.json Tom Morris, 12/07/2016 03:20 AM

Subtasks 2 (0 open2 closed)

Task #11267: Review 10669-safe-http-cacheResolvedPeter Amstutz12/06/2016Actions
Task #11318: thread- and process-safe cacheResolvedTom Clegg12/06/2016Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #11226: [SDK] Python client tries to cache discovery document at the same local path regardless of userResolvedPeter Amstutz03/10/2017Actions
Is duplicate of Arvados - Bug #9521: Corrupt discovery document should be handled betterDuplicateTom Morris06/30/2016Actions
Actions #1

Updated by Tom Morris about 8 years ago

  • File corrupt.json corrupt.json added
  • Description updated (diff)
  • Assigned To set to Tom Morris
  • Target version set to Arvados Future Sprints
Actions #2

Updated by Tom Morris about 8 years ago

  • Target version changed from Arvados Future Sprints to 2017-01-04 sprint
Actions #3

Updated by Tom Morris about 8 years ago

  • Story points set to 1.0
Actions #4

Updated by Tom Morris about 8 years ago

  • Target version deleted (2017-01-04 sprint)
Actions #5

Updated by Tom Clegg almost 8 years ago

Discovery doc retrieval is done by google-api-client. We ask it to use httplib2's built-in cache mechanism.

Unfortunately that built-in cache is "Not really safe to use if multiple threads or processes are going to be running on the same cache."

For example:

    def set(self, key, value):
        cacheFullPath = os.path.join(self.cache, self.safe(key))
        f = open(cacheFullPath, "wb")
        f.write(value)
        f.close()

https://github.com/jcgregorio/httplib2/blob/master/python3/httplib2/__init__.py#L666-L697

Fortunately it looks like we can make an object that supports the same interface as FileCache, and pass that to httplib2.Http() instead of "~/.cache/arvados/discovery".

Actions #6

Updated by Tom Morris almost 8 years ago

  • Target version set to 2017-04-12 sprint
Actions #7

Updated by Tom Morris almost 8 years ago

  • Assigned To deleted (Tom Morris)
Actions #8

Updated by Tom Morris almost 8 years ago

  • Target version changed from 2017-04-12 sprint to 2017-03-29 sprint
Actions #9

Updated by Tom Clegg almost 8 years ago

  • Assigned To set to Tom Clegg
Actions #10

Updated by Tom Morris almost 8 years ago

  • Status changed from New to Duplicate
Actions #11

Updated by Tom Morris almost 8 years ago

  • Status changed from Duplicate to New
Actions #12

Updated by Tom Clegg almost 8 years ago

  • Category set to SDKs
  • Status changed from New to In Progress
Actions #13

Updated by Tom Clegg almost 8 years ago

10669-safe-http-cache @ 514cd364c3cb27b633c1368cd06d6a54927c98a8

This branch uses ~/.cache/arvados/discovery/{md5}.tmp instead of ~/.cache/arvados/discovery/{md5} which has two benefits:
  • it ignores old corrupt files left behind by the default httplib2 implementation
  • it can clean up its own mess by removing *.tmp with much less risk of deleting valuable stuff (even if it somehow ends up using a useful directory as a cache directory)
Actions #14

Updated by Peter Amstutz almost 8 years ago

Did you try the multithreaded test_cache_crud against the regular httplib2 cache? Does the (unsafe) cache fail the test regularly, where the new cache passes?

Actions #15

Updated by Peter Amstutz almost 8 years ago

For what its worth, from my cache directory:

Old filename:
qr1hi.arvadosapi.com,discovery,v1,apis,arvados,v1,rest,86e60ccc4da91c726aa80dfb73fa9cbf

New filename:
86e60ccc4da91c726aa80dfb73fa9cbf.tmp

Not really sure of the original purpose/intent of having the filename contain both a munged URL and the hash of the URL (except maybe to make debugging easier).

Content is the same except timestamps in the headers.

Actions #16

Updated by Tom Clegg almost 8 years ago

Ah yes, older versions of httplib2 used just the md5 but current versions use a munged url plus md5. And yeah, I assume the reason is for debugging, so you don't have to grep for ^content-location:.

Actions #17

Updated by Tom Clegg almost 8 years ago

Peter Amstutz wrote:

Did you try the multithreaded test_cache_crud against the regular httplib2 cache? Does the (unsafe) cache fail the test regularly, where the new cache passes?

(I hadn't before, but) I tried it, and yes, it fails horribly. Fixed up the test case so it prints a bunch of errors and then fails an assertion, instead of dumping a lot of stack traces.

10669-safe-http-cache @ c56743e301b49163a56482c13e49a01c9a0fd7dc

Actions #18

Updated by Tom Clegg almost 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:1667f9860de21d29bbe32bb827db29eca62d9aeb.

Actions

Also available in: Atom PDF