Project

General

Profile

Actions

Idea #20187

closed

Cache discovery doc in memory in controller

Added by Tom Clegg about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
1.0
Release relationship:
Auto

Description

The existing Rails disk cache is not especially efficient (it's wasteful to do a Rails request for static content) and may be unreliable (we've seen 200 status with an empty response body, possibly caused by a full temp disk).

Rather than proxy each GET /discovery/v1/apis/arvados/v1/rest request through to railsapi, controller should
  • (if not already cached) fetch the discovery doc from rails
  • ensure it's valid JSON (if not, return 502)
  • cache it in memory
  • return the cached document from now on
  • periodically refresh the cache (to pick up changes if railsapi restarts with a different config that affects the discovery doc), but while refresh is in progress/failing, continue to return the existing cached document

Subtasks 1 (0 open1 closed)

Task #20197: Review 20187-cache-discovery-docResolvedTom Clegg03/15/2023Actions

Related issues

Related to Arvados - Bug #20203: API server returns 200 OK when it can't cache a documentResolvedTom Clegg04/14/2023Actions
Actions #1

Updated by Tom Clegg about 1 year ago

  • Target version changed from To be scheduled to Development 2023-03-15 sprint
  • Assigned To set to Tom Clegg
Actions #2

Updated by Brett Smith about 1 year ago

Tom Clegg wrote:

  • ensure it's valid JSON (if not, return 502)

Just to add color, we have user feedback that the error return is a key part of this story.

I think it would be nice if it did a little more validation than just "valid JSON," like checking that it "looks like" the discovery document by looking up a required field or two.

Actions #3

Updated by Brett Smith about 1 year ago

  • Related to Bug #20203: API server returns 200 OK when it can't cache a document added
Actions #5

Updated by Tom Clegg about 1 year ago

  • Status changed from New to In Progress
Actions #7

Updated by Tom Clegg about 1 year ago

  • Target version changed from Development 2023-03-15 sprint to Development 2023-03-29 Sprint
Actions #8

Updated by Tom Clegg about 1 year ago

20187-cache-discovery-doc @ c16b905d89fb7c2b9ed8f91ca299f08874b58dab -- developer-run-tests: #3552

Adds the "really a discovery doc" check.

Actions #9

Updated by Lucas Di Pentima about 1 year ago

Some comments & questions:

  • File lib/controller/handler.go
    • Line 174: Is there a reason to have this validateDD flag? Right now we're just always setting it as true, AFAICT.
    • Related to the above: cacheEnt seems to be written as a kind of generic cache entry struct, but then it got specific with the discovery doc validation stuff. Couldn't we just pass an optional validation func to the struct instead, to maintain its flexibility for future different uses? if it that doesn't make sense, maybe just name the struct something more explicit like ddCacheEnt?
  • File lib/controller/handler_test.go
    • Line 243: I think it would be convenient to have a wantError = true to make the test sub-case more explicit and not depend on the previous sub-case (just in case code gets moved around without much checking)
  • Shouldn't we also handle a different kind of TTL where the cached response is not valid any more (like DNS does), so that persisting underlying issues on RailsAPI that avoid the cache to be refreshed are eventually visible from the client's perspective?
  • The ticket's description mentions that Rails uses an unreliable disk cache for the DD. Would it be a good idea to disable that cache now?
Actions #10

Updated by Tom Clegg about 1 year ago

Lucas Di Pentima wrote in #note-9:

  • Line 174: Is there a reason to have this validateDD flag? Right now we're just always setting it as true, AFAICT.
  • Related to the above: cacheEnt seems to be written as a kind of generic cache entry struct, but then it got specific with the discovery doc validation stuff. Couldn't we just pass an optional validation func to the struct instead, to maintain its flexibility for future different uses? if it that doesn't make sense, maybe just name the struct something more explicit like ddCacheEnt?

Ah yes, a validation function does seem much nicer. Updated.

  • Line 243: I think it would be convenient to have a wantError = true to make the test sub-case more explicit and not depend on the previous sub-case (just in case code gets moved around without much checking)

Updated to set wantError / wantBadContent explicitly for each section.

  • Shouldn't we also handle a different kind of TTL where the cached response is not valid any more (like DNS does), so that persisting underlying issues on RailsAPI that avoid the cache to be refreshed are eventually visible from the client's perspective?

Hm, I think this makes sense, especially with the Rails caching removed. What's a good expiry time? I've used 24h (that was the Rails cache ttl) but maybe ~1 hour would make more sense?

  • The ticket's description mentions that Rails uses an unreliable disk cache for the DD. Would it be a good idea to disable that cache now?

Yes, removed the rails cache.

20187-cache-discovery-doc @ fe1c718ce87a72b84f59215decc613d4d5010dce -- developer-run-tests: #3553

Actions #11

Updated by Brett Smith about 1 year ago

Tom Clegg wrote in #note-10:

  • The ticket's description mentions that Rails uses an unreliable disk cache for the DD. Would it be a good idea to disable that cache now?

Yes, removed the rails cache.

I am concerned this is one of those changes that's easy to implement, and seems right in this context, but could have surprising far-reaching consequences. Do we know what else Rails is caching for production API servers? Are we sure it's not caching any intermediate data?

I hope it's a good change, and as long as it is I'm fine with making it. But it's outside the original scope of the ticket as written, and a significant enough change I'd feel more comfortable if the team had a chance to weigh in.

Actions #12

Updated by Tom Clegg about 1 year ago

To clarify, I didn't remove the Rails cache entirely, I just stopped using it to explicitly cache the discovery doc.

Actions #13

Updated by Brett Smith about 1 year ago

Tom Clegg wrote in #note-12:

To clarify, I didn't remove the Rails cache entirely, I just stopped using it to explicitly cache the discovery doc.

Okay yeah that basically mitigates all my concerns then. Thanks for clarifying.

Actions #14

Updated by Lucas Di Pentima about 1 year ago

Just one extra suggestion, otherwise LGTM:

  • There's one initializer and 2 test files that clear the discovery doc's Rails cache entry, that I guess we can remove.

Thanks!

Actions #15

Updated by Peter Amstutz about 1 year ago

  • Release set to 57
Actions #17

Updated by Tom Clegg about 1 year ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF