Bug #10669

googleapiclient.errors.InvalidJsonError retrieving discovery document

Added by Tom Morris 9 months ago. Updated 5 months ago.

Status:ResolvedStart date:12/06/2016
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

50%

Category:SDKs
Target version:2017-03-29 sprint
Story points1.0Remaining (hours)0.00 hour
Velocity based estimate-

Description

When logging in to a shell node, I got the error below, apparently because the cached version of the discovery document was corrupted. Deleting the contents of ~/.cache/discovery/ fixed the issue, but I think there are two things that need to be fixed:

1. Make sure that the file gets truncated to the new length if a shorter discovery document is downloaded. The fact that the corrupted version of the file ends with:
,"websocketUrl":"wss://ws.e51c5.arvadosapi.com/websocket"}om/web
makes me suspicious of the garbage at the end of the file.

2. If there's an error parsing the cached file, it should be deleted and re-downloaded to see if the problem can be cleared

No handlers could be found for logger "googleapiclient.discovery"
Traceback (most recent call last):
File "/usr/local/arvados/record-login.py", line 9, in <module>
arv = arvados.api('v1')
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 185, in api
return api_from_config(version=version, cache=cache, **kwargs)
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 244, in api_from_config
return api(version=version, host=host, token=token, insecure=insecure, **kwargs)
File "/usr/lib/python2.7/dist-packages/arvados/api.py", line 208, in api
svc = apiclient_discovery.build('arvados', version, **kwargs)
File "/usr/lib/python2.7/dist-packages/oauth2client/util.py", line 140, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 196, in build
tfmorris@shell:~$ cache)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 256, in _retrieve_discovery_doc
raise InvalidJsonError()
googleapiclient.errors.InvalidJsonError

corrupt.json (160 KB) Tom Morris, 12/07/2016 03:20 am


Subtasks

Task #11267: Review 10669-safe-http-cacheResolvedPeter Amstutz

Task #11318: thread- and process-safe cacheResolvedTom Clegg


Related issues

Related to Arvados - Bug #11226: [SDK] Python client tries to cache discovery document at ... Resolved 03/10/2017
Duplicates Arvados - Bug #9521: Corrupt discovery document should be handled better Duplicate 06/30/2016

Associated revisions

Revision 1667f986
Added by Tom Clegg 5 months ago

Merge branch '10669-safe-http-cache'

closes #10669

History

#1 Updated by Tom Morris 9 months ago

  • File corrupt.json added
  • Description updated (diff)
  • Assignee set to Tom Morris
  • Target version set to Arvados Future Sprints

#2 Updated by Tom Morris 8 months ago

  • Target version changed from Arvados Future Sprints to 2017-01-04 sprint

#3 Updated by Tom Morris 8 months ago

  • Story points set to 1.0

#4 Updated by Tom Morris 8 months ago

  • Target version deleted (2017-01-04 sprint)

#5 Updated by Tom Clegg 6 months ago

Discovery doc retrieval is done by google-api-client. We ask it to use httplib2's built-in cache mechanism.

Unfortunately that built-in cache is "Not really safe to use if multiple threads or processes are going to be running on the same cache."

For example:

    def set(self, key, value):
        cacheFullPath = os.path.join(self.cache, self.safe(key))
        f = open(cacheFullPath, "wb")
        f.write(value)
        f.close()

https://github.com/jcgregorio/httplib2/blob/master/python3/httplib2/__init__.py#L666-L697

Fortunately it looks like we can make an object that supports the same interface as FileCache, and pass that to httplib2.Http() instead of "~/.cache/arvados/discovery".

#6 Updated by Tom Morris 5 months ago

  • Target version set to 2017-04-12 sprint

#7 Updated by Tom Morris 5 months ago

  • Assignee deleted (Tom Morris)

#8 Updated by Tom Morris 5 months ago

  • Target version changed from 2017-04-12 sprint to 2017-03-29 sprint

#9 Updated by Tom Clegg 5 months ago

  • Assignee set to Tom Clegg

#10 Updated by Tom Morris 5 months ago

  • Status changed from New to Duplicate

#11 Updated by Tom Morris 5 months ago

  • Status changed from Duplicate to New

#12 Updated by Tom Clegg 5 months ago

  • Category set to SDKs
  • Status changed from New to In Progress

#13 Updated by Tom Clegg 5 months ago

10669-safe-http-cache @ 514cd364c3cb27b633c1368cd06d6a54927c98a8

This branch uses ~/.cache/arvados/discovery/{md5}.tmp instead of ~/.cache/arvados/discovery/{md5} which has two benefits:
  • it ignores old corrupt files left behind by the default httplib2 implementation
  • it can clean up its own mess by removing *.tmp with much less risk of deleting valuable stuff (even if it somehow ends up using a useful directory as a cache directory)

#14 Updated by Peter Amstutz 5 months ago

Did you try the multithreaded test_cache_crud against the regular httplib2 cache? Does the (unsafe) cache fail the test regularly, where the new cache passes?

#15 Updated by Peter Amstutz 5 months ago

For what its worth, from my cache directory:

Old filename:
qr1hi.arvadosapi.com,discovery,v1,apis,arvados,v1,rest,86e60ccc4da91c726aa80dfb73fa9cbf

New filename:
86e60ccc4da91c726aa80dfb73fa9cbf.tmp

Not really sure of the original purpose/intent of having the filename contain both a munged URL and the hash of the URL (except maybe to make debugging easier).

Content is the same except timestamps in the headers.

#16 Updated by Tom Clegg 5 months ago

Ah yes, older versions of httplib2 used just the md5 but current versions use a munged url plus md5. And yeah, I assume the reason is for debugging, so you don't have to grep for ^content-location:.

#17 Updated by Tom Clegg 5 months ago

Peter Amstutz wrote:

Did you try the multithreaded test_cache_crud against the regular httplib2 cache? Does the (unsafe) cache fail the test regularly, where the new cache passes?

(I hadn't before, but) I tried it, and yes, it fails horribly. Fixed up the test case so it prints a bunch of errors and then fails an assertion, instead of dumping a lot of stack traces.

10669-safe-http-cache @ c56743e301b49163a56482c13e49a01c9a0fd7dc

#18 Updated by Tom Clegg 5 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:1667f9860de21d29bbe32bb827db29eca62d9aeb.

Also available in: Atom PDF