Feature #10541

Updated by Tom Clegg over 4 years ago

h2. Background

Certain common scenarios -- for example, starting up a job

Rationale: when
using cause many clients to retrieve blob-backed storage like S3, Keepstore no longer benefits from the same data kernel block at around cache to read frequently-requested blocks from ram. So we to build a caching mechanism into Keepstore - something external (squid, etc) won't do because of the same time. permission signatures.

h2. Proposed improvement Solution:

Extend the memory buffer allocation system to act as a cache of recently fetched data.
* When performing a GET operation, if another goroutine is already processing a GET request for the same block, wait for it to finish finding/retrieving the block data, and then respond with that data instead of re-fetching.
* When returning a buffer to
wiping the pool, attach the relevant block hash so the data can be reused by after filling a future GET request.
* When getting a buffer from
buffer, keep the pool to service a GET request, first try to get a buffer that already has data for around with the requested block. If that is available, use the cached data instead of reading in it from a volume.
* When getting when a buffer new request comes in for a PUT request, or a GET request hash for un-cached data, allocate a new one or (if max buffers are which we already allocated) use the least recently used buffer.
* As before, if all buffers are already in use, wait until one is available or (after reaching max clients) return 503.

h2. Implementation ideas

Instead of
have a sync.Pool of byte slices, add buffer, serve it from that (and update a buffer type (containing a []byte, a reference counter, and whatever else is needed last-read timestamp for synchronization) and use a []*buffer as an LRU list. Maintain a map[string]*buffer mapping hashes to buffers (some buffers won't have a corresponding hash, e.g., after errors).

Add a bufferPool type to manage buffer sharing and repurposing.
that buffer)
* All overlapping GET requests when a new request comes in for a given hash should share the same buffer.
* If requests
and there are sharing a no empty buffers available, reuse that filled buffer and that has the data isn't already present, one handler attempts to fetch the data while the others wait. If an error occurs, all of the waiting requests return an error.
oldest last-read timestamp