Idea #20187
closedCache discovery doc in memory in controller
Description
The existing Rails disk cache is not especially efficient (it's wasteful to do a Rails request for static content) and may be unreliable (we've seen 200 status with an empty response body, possibly caused by a full temp disk).
Rather than proxy eachGET /discovery/v1/apis/arvados/v1/rest
request through to railsapi, controller should
- (if not already cached) fetch the discovery doc from rails
- ensure it's valid JSON (if not, return 502)
- cache it in memory
- return the cached document from now on
- periodically refresh the cache (to pick up changes if railsapi restarts with a different config that affects the discovery doc), but while refresh is in progress/failing, continue to return the existing cached document
Updated by Tom Clegg almost 2 years ago
- Target version changed from To be scheduled to Development 2023-03-15 sprint
- Assigned To set to Tom Clegg
Updated by Brett Smith almost 2 years ago
Tom Clegg wrote:
- ensure it's valid JSON (if not, return 502)
Just to add color, we have user feedback that the error return is a key part of this story.
I think it would be nice if it did a little more validation than just "valid JSON," like checking that it "looks like" the discovery document by looking up a required field or two.
Updated by Brett Smith almost 2 years ago
- Related to Bug #20203: API server returns 200 OK when it can't cache a document added
Updated by Tom Clegg almost 2 years ago
20187-cache-discovery-doc @ dae3da4d70bfba901f2147775dd6c07be61416b2 -- developer-run-tests: #3551
Updated by Tom Clegg almost 2 years ago
- Target version changed from Development 2023-03-15 sprint to Development 2023-03-29 Sprint
Updated by Tom Clegg almost 2 years ago
20187-cache-discovery-doc @ c16b905d89fb7c2b9ed8f91ca299f08874b58dab -- developer-run-tests: #3552
Adds the "really a discovery doc" check.
Updated by Lucas Di Pentima almost 2 years ago
Some comments & questions:
- File
lib/controller/handler.go
- Line 174: Is there a reason to have this
validateDD
flag? Right now we're just always setting it as true, AFAICT. - Related to the above:
cacheEnt
seems to be written as a kind of generic cache entry struct, but then it got specific with the discovery doc validation stuff. Couldn't we just pass an optional validation func to the struct instead, to maintain its flexibility for future different uses? if it that doesn't make sense, maybe just name the struct something more explicit likeddCacheEnt
?
- Line 174: Is there a reason to have this
- File
lib/controller/handler_test.go
- Line 243: I think it would be convenient to have a
wantError = true
to make the test sub-case more explicit and not depend on the previous sub-case (just in case code gets moved around without much checking)
- Line 243: I think it would be convenient to have a
- Shouldn't we also handle a different kind of TTL where the cached response is not valid any more (like DNS does), so that persisting underlying issues on RailsAPI that avoid the cache to be refreshed are eventually visible from the client's perspective?
- The ticket's description mentions that Rails uses an unreliable disk cache for the DD. Would it be a good idea to disable that cache now?
Updated by Tom Clegg almost 2 years ago
Lucas Di Pentima wrote in #note-9:
- Line 174: Is there a reason to have this
validateDD
flag? Right now we're just always setting it as true, AFAICT.- Related to the above:
cacheEnt
seems to be written as a kind of generic cache entry struct, but then it got specific with the discovery doc validation stuff. Couldn't we just pass an optional validation func to the struct instead, to maintain its flexibility for future different uses? if it that doesn't make sense, maybe just name the struct something more explicit likeddCacheEnt
?
Ah yes, a validation function does seem much nicer. Updated.
- Line 243: I think it would be convenient to have a
wantError = true
to make the test sub-case more explicit and not depend on the previous sub-case (just in case code gets moved around without much checking)
Updated to set wantError / wantBadContent explicitly for each section.
- Shouldn't we also handle a different kind of TTL where the cached response is not valid any more (like DNS does), so that persisting underlying issues on RailsAPI that avoid the cache to be refreshed are eventually visible from the client's perspective?
Hm, I think this makes sense, especially with the Rails caching removed. What's a good expiry time? I've used 24h (that was the Rails cache ttl) but maybe ~1 hour would make more sense?
- The ticket's description mentions that Rails uses an unreliable disk cache for the DD. Would it be a good idea to disable that cache now?
Yes, removed the rails cache.
20187-cache-discovery-doc @ fe1c718ce87a72b84f59215decc613d4d5010dce -- developer-run-tests: #3553
Updated by Brett Smith almost 2 years ago
Tom Clegg wrote in #note-10:
- The ticket's description mentions that Rails uses an unreliable disk cache for the DD. Would it be a good idea to disable that cache now?
Yes, removed the rails cache.
I am concerned this is one of those changes that's easy to implement, and seems right in this context, but could have surprising far-reaching consequences. Do we know what else Rails is caching for production API servers? Are we sure it's not caching any intermediate data?
I hope it's a good change, and as long as it is I'm fine with making it. But it's outside the original scope of the ticket as written, and a significant enough change I'd feel more comfortable if the team had a chance to weigh in.
Updated by Tom Clegg almost 2 years ago
To clarify, I didn't remove the Rails cache entirely, I just stopped using it to explicitly cache the discovery doc.
Updated by Brett Smith almost 2 years ago
Tom Clegg wrote in #note-12:
To clarify, I didn't remove the Rails cache entirely, I just stopped using it to explicitly cache the discovery doc.
Okay yeah that basically mitigates all my concerns then. Thanks for clarifying.
Updated by Lucas Di Pentima almost 2 years ago
Just one extra suggestion, otherwise LGTM:
- There's one initializer and 2 test files that clear the discovery doc's Rails cache entry, that I guess we can remove.
Thanks!
Updated by Tom Clegg almost 2 years ago
20187-cache-discovery-doc @ ce5ad31aac392b38889fc04640fcd06474cebeb6 -- developer-run-tests: #3562
Updated by Tom Clegg almost 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|2d3dc22dac4d83aa444a61f717a83aee7b17cf6d.