Feature #11809

[keep-web] Cache collections and permissions

Added by Tom Clegg about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
06/05/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Background

It's common for a client to make lots of requests for the same collection using the same token (e.g., static assets for a web page, or many small excerpts from a large indexed bam file).

The Go SDK automatically caches the file data, but before the file data cache can even be used, keep-web retrieves the collection from the API server in order to verify permission and determine which portions of which blocks to return. This API call causes unnecessarily high latency when data is cached, and dominates the overall response time when the response data is small.

Proposed solution

Use LRU caches for
  • collection content (uuid → pdh)
  • manifests (pdh → manifest)
  • permission lookups ((token, uuid-or-pdh) → bool)

If there is a Cache-Control request header, skip the cache and do the API call as before. Except: If the request specifies a PDH and the manifest is already in the cache, the only information needed is the permission check, so the API call should use the "select" parameter to avoid retrieving and returning the manifest unnecessarily.

Use TwoQueueCache from https://github.com/hashicorp/golang-lru or something similar.

The cache sizes should be configurable via /etc/arvados/keep-web/keep-web.yml, with defaults:
  • UUIDCacheEntries: 100
    PermissionCacheEntries: 100
    ManifestCacheEntries: 100
    
The memory occupied by the manifest cache depends on the size of the manifests more than the number, so there should be a mechanism for limiting it by total size. For example, after adding a manifest bigger than ManifestCacheBytes÷ManifestCacheEntries, scan the cache, and delete old items if needed to bring memory usage down to ManifestCacheBytes.
  • ManifestCacheBytes: 100000000

Metrics

Respond to "GET /status.json" (only if there is no collection ID in the Host request header!) with current cache stats.
  • {
      "UUIDCacheHits": 1234,
      "UUIDCacheMisses": 2345,
      "PermissionCacheHits": 123,
      "PermissionCacheMisses": 234,
      "ManifestCacheHits": 1234,
      "ManifestCacheMisses": 2345,
      "ManifestCacheEntries": 100,
      "ManifestCacheBytes": 12345678
    }
    

Subtasks

Task #11810: Cache collection lookupsResolvedTom Clegg

Task #11811: TestsResolvedTom Clegg

Task #11812: Review 11809-keep-web-cacheResolvedRadhika Chippada

Associated revisions

Revision 29cb16bd
Added by Tom Clegg about 2 years ago

Merge branch '11809-keep-web-cache'

closes #11809

History

#1 Updated by Tom Clegg about 2 years ago

  • Category set to Keep
  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
  • Target version set to 2017-06-07 sprint

#2 Updated by Tom Clegg about 2 years ago

  • Description updated (diff)

#3 Updated by Tom Clegg about 2 years ago

#4 Updated by Radhika Chippada about 2 years ago

  • In type cache struct, the names such as CollectionEntries, CollectionBytes etc are not clearly communicating the fact that they are “max” sizes on these (they feel like they are holding collection entries, collection bytes etc instead). I think calling them MaxCollectionEntries or NumCollections or something like that would be much better from documentation perspective
  • Do TTL, CollectionEntries etc in type cache struct need to start with an upper case letter or can they all start with lower case like stats, pdhs etc? (they all are only used locally)
  • Do the Requests, CollectionHits etc in cacheStats need to start with upper case letters?
  • When size > cache.CollectionBytes, pruneCollections func removes collections. What is the order of the keys here? Is this deleting oldest / least used first? If not, wondering if we should remove from the last key in reverse order?
  • do we want to pruneCollections in a separate go routine rather than adding extra delay during Get?
  • Should we make the defaults for CollectionEntries, PermissionEntries, and UUIDEntries 1000? This might help if the collections are small?
  • Do we want to add another test with ForceReuse=true?
  • Do we need to “print” cache stats to log or something in main?
  • please ignore my earlier comment about test failures. Tests are failing for me in master branch as well …

#5 Updated by Tom Clegg about 2 years ago

Radhika Chippada wrote:

  • In type cache struct, the names such as CollectionEntries, CollectionBytes etc are not clearly communicating the fact that they are “max” sizes on these (they feel like they are holding collection entries, collection bytes etc instead). I think calling them MaxCollectionEntries or NumCollections or something like that would be much better from documentation perspective

Renamed to MaxCollectionEntries etc.

  • Do TTL, CollectionEntries etc in type cache struct need to start with an upper case letter or can they all start with lower case like stats, pdhs etc? (they all are only used locally)
  • Do the Requests, CollectionHits etc in cacheStats need to start with upper case letters?

Yes, only public fields are loaded/exported in json config/stats.

  • When size > cache.CollectionBytes, pruneCollections func removes collections. What is the order of the keys here? Is this deleting oldest / least used first? If not, wondering if we should remove from the last key in reverse order?

Yes, Keys() returns keys in oldest-to-newest order.

  • do we want to pruneCollections in a separate go routine rather than adding extra delay during Get?

Indeed. Added a "go" to this.

  • Should we make the defaults for CollectionEntries, PermissionEntries, and UUIDEntries 1000? This might help if the collections are small?

Yes, 100 does seem unnecessarily small.

  • Do we want to add another test with ForceReuse=true?

Yes, added.

  • Do we need to “print” cache stats to log or something in main?

Now available via /status.json.

11809-keep-web-cache @ 72e22b49ec2721d3a1369da768d3d74fa9c079c3

#6 Updated by Radhika Chippada about 2 years ago

  • Would it be desirable to add a test for status.json requests in hander_test?

LGTM. Thanks.

#7 Updated by Tom Clegg about 2 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:29cb16bdd726b09f8cded0d245ed6a72c62eaf8b.

Also available in: Atom PDF