Keep cache policy

Currently Keep uses a LRU cache policy. This has very bad worst case behavior when the cache size is a bit smaller than the data size, and the reader does multiple sequential scans of the data. On each pass, every block has to be reloaded.


Switch to a hybrid LRU/random eviction policy.

The top N blocks or bytes is protected by LRU policy. The remaining blocks are subject to random eviction.

Example: with a 1 GiB cache (16 blocks) and 1.1 GiB of data (18 blocks), with current LRU policy the overage of 2 blocks means the oldest blocks are evicted right before they are used on the next pass. There is a 100% miss rate.

With random eviction, with N=4 the top 4 blocks are protected and 12 remaining cache slots are considered for eviction with an overage of 2. This means there is a 2/12 (16.6%) chance of evicting the next block.

Note: N should be set at a minimum to the number of simultaneous readers. The Collection API and/or FUSE mount could adjust this dynamically based on the number of open file descriptors.

Behavior changes:

If the cache is large enough to store the entire working set, random replacement does not change the behavior.

For a single sequential scan through a large file, random replacement does not change behavior (protected region representing current frontier is not evicted.)

For repeated access to a working set that is > top N blocks, followed by moving to a new working set, random replacement may keep blocks from the old set and evict blocks in the new set that would have been kept with LRU (although top blocks would remain protected.)

Updated by Peter Amstutz over 7 years ago · 1 revisions