Project

General

Profile

Actions

Feature #11809

closed

[keep-web] Cache collections and permissions

Added by Tom Clegg over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
-

Description

Background

It's common for a client to make lots of requests for the same collection using the same token (e.g., static assets for a web page, or many small excerpts from a large indexed bam file).

The Go SDK automatically caches the file data, but before the file data cache can even be used, keep-web retrieves the collection from the API server in order to verify permission and determine which portions of which blocks to return. This API call causes unnecessarily high latency when data is cached, and dominates the overall response time when the response data is small.

Proposed solution

Use LRU caches for
  • collection content (uuid → pdh)
  • manifests (pdh → manifest)
  • permission lookups ((token, uuid-or-pdh) → bool)

If there is a Cache-Control request header, skip the cache and do the API call as before. Except: If the request specifies a PDH and the manifest is already in the cache, the only information needed is the permission check, so the API call should use the "select" parameter to avoid retrieving and returning the manifest unnecessarily.

Use TwoQueueCache from https://github.com/hashicorp/golang-lru or something similar.

The cache sizes should be configurable via /etc/arvados/keep-web/keep-web.yml, with defaults:
  • UUIDCacheEntries: 100
    PermissionCacheEntries: 100
    ManifestCacheEntries: 100
    
The memory occupied by the manifest cache depends on the size of the manifests more than the number, so there should be a mechanism for limiting it by total size. For example, after adding a manifest bigger than ManifestCacheBytes÷ManifestCacheEntries, scan the cache, and delete old items if needed to bring memory usage down to ManifestCacheBytes.
  • ManifestCacheBytes: 100000000

Metrics

Respond to "GET /status.json" (only if there is no collection ID in the Host request header!) with current cache stats.
  • {
      "UUIDCacheHits": 1234,
      "UUIDCacheMisses": 2345,
      "PermissionCacheHits": 123,
      "PermissionCacheMisses": 234,
      "ManifestCacheHits": 1234,
      "ManifestCacheMisses": 2345,
      "ManifestCacheEntries": 100,
      "ManifestCacheBytes": 12345678
    }
    

Subtasks 3 (0 open3 closed)

Task #11810: Cache collection lookupsResolvedTom Clegg06/05/2017Actions
Task #11811: TestsResolvedTom Clegg06/05/2017Actions
Task #11812: Review 11809-keep-web-cacheResolvedRadhika Chippada06/05/2017Actions
Actions #1

Updated by Tom Clegg over 7 years ago

  • Category set to Keep
  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
  • Target version set to 2017-06-07 sprint
Actions #2

Updated by Tom Clegg over 7 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg over 7 years ago

Actions #4

Updated by Radhika Chippada over 7 years ago

  • In type cache struct, the names such as CollectionEntries, CollectionBytes etc are not clearly communicating the fact that they are “max” sizes on these (they feel like they are holding collection entries, collection bytes etc instead). I think calling them MaxCollectionEntries or NumCollections or something like that would be much better from documentation perspective
  • Do TTL, CollectionEntries etc in type cache struct need to start with an upper case letter or can they all start with lower case like stats, pdhs etc? (they all are only used locally)
  • Do the Requests, CollectionHits etc in cacheStats need to start with upper case letters?
  • When size > cache.CollectionBytes, pruneCollections func removes collections. What is the order of the keys here? Is this deleting oldest / least used first? If not, wondering if we should remove from the last key in reverse order?
  • do we want to pruneCollections in a separate go routine rather than adding extra delay during Get?
  • Should we make the defaults for CollectionEntries, PermissionEntries, and UUIDEntries 1000? This might help if the collections are small?
  • Do we want to add another test with ForceReuse=true?
  • Do we need to “print” cache stats to log or something in main?
  • please ignore my earlier comment about test failures. Tests are failing for me in master branch as well …
Actions #5

Updated by Tom Clegg over 7 years ago

Radhika Chippada wrote:

  • In type cache struct, the names such as CollectionEntries, CollectionBytes etc are not clearly communicating the fact that they are “max” sizes on these (they feel like they are holding collection entries, collection bytes etc instead). I think calling them MaxCollectionEntries or NumCollections or something like that would be much better from documentation perspective

Renamed to MaxCollectionEntries etc.

  • Do TTL, CollectionEntries etc in type cache struct need to start with an upper case letter or can they all start with lower case like stats, pdhs etc? (they all are only used locally)
  • Do the Requests, CollectionHits etc in cacheStats need to start with upper case letters?

Yes, only public fields are loaded/exported in json config/stats.

  • When size > cache.CollectionBytes, pruneCollections func removes collections. What is the order of the keys here? Is this deleting oldest / least used first? If not, wondering if we should remove from the last key in reverse order?

Yes, Keys() returns keys in oldest-to-newest order.

  • do we want to pruneCollections in a separate go routine rather than adding extra delay during Get?

Indeed. Added a "go" to this.

  • Should we make the defaults for CollectionEntries, PermissionEntries, and UUIDEntries 1000? This might help if the collections are small?

Yes, 100 does seem unnecessarily small.

  • Do we want to add another test with ForceReuse=true?

Yes, added.

  • Do we need to “print” cache stats to log or something in main?

Now available via /status.json.

11809-keep-web-cache @ 72e22b49ec2721d3a1369da768d3d74fa9c079c3

Actions #6

Updated by Radhika Chippada over 7 years ago

  • Would it be desirable to add a test for status.json requests in hander_test?

LGTM. Thanks.

Actions #7

Updated by Tom Clegg over 7 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:29cb16bdd726b09f8cded0d245ed6a72c62eaf8b.

Actions

Also available in: Atom PDF