Project

General

Profile

Actions

Feature #11809

closed

[keep-web] Cache collections and permissions

Added by Tom Clegg almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
-

Description

Background

It's common for a client to make lots of requests for the same collection using the same token (e.g., static assets for a web page, or many small excerpts from a large indexed bam file).

The Go SDK automatically caches the file data, but before the file data cache can even be used, keep-web retrieves the collection from the API server in order to verify permission and determine which portions of which blocks to return. This API call causes unnecessarily high latency when data is cached, and dominates the overall response time when the response data is small.

Proposed solution

Use LRU caches for
  • collection content (uuid → pdh)
  • manifests (pdh → manifest)
  • permission lookups ((token, uuid-or-pdh) → bool)

If there is a Cache-Control request header, skip the cache and do the API call as before. Except: If the request specifies a PDH and the manifest is already in the cache, the only information needed is the permission check, so the API call should use the "select" parameter to avoid retrieving and returning the manifest unnecessarily.

Use TwoQueueCache from https://github.com/hashicorp/golang-lru or something similar.

The cache sizes should be configurable via /etc/arvados/keep-web/keep-web.yml, with defaults:
  • UUIDCacheEntries: 100
    PermissionCacheEntries: 100
    ManifestCacheEntries: 100
    
The memory occupied by the manifest cache depends on the size of the manifests more than the number, so there should be a mechanism for limiting it by total size. For example, after adding a manifest bigger than ManifestCacheBytes÷ManifestCacheEntries, scan the cache, and delete old items if needed to bring memory usage down to ManifestCacheBytes.
  • ManifestCacheBytes: 100000000

Metrics

Respond to "GET /status.json" (only if there is no collection ID in the Host request header!) with current cache stats.
  • {
      "UUIDCacheHits": 1234,
      "UUIDCacheMisses": 2345,
      "PermissionCacheHits": 123,
      "PermissionCacheMisses": 234,
      "ManifestCacheHits": 1234,
      "ManifestCacheMisses": 2345,
      "ManifestCacheEntries": 100,
      "ManifestCacheBytes": 12345678
    }
    

Subtasks 3 (0 open3 closed)

Task #11810: Cache collection lookupsResolvedTom Clegg06/05/2017Actions
Task #11811: TestsResolvedTom Clegg06/05/2017Actions
Task #11812: Review 11809-keep-web-cacheResolvedRadhika Chippada06/05/2017Actions
Actions

Also available in: Atom PDF