Project

General

Profile

Actions

Feature #10541

open

[Keep] Share buffers between overlapping/consecutive GET requests for the same block

Added by Tom Clegg over 7 years ago. Updated 27 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
2.0
Release:
Release relationship:
Auto

Description

Background

Certain common scenarios -- for example, starting up a job using cause many clients to retrieve the same data block at around the same time.

Proposed improvement

Extend the memory buffer allocation system to act as a cache of recently fetched data.
  • When performing a GET operation, if another goroutine is already processing a GET request for the same block, wait for it to finish finding/retrieving the block data, and then respond with that data instead of re-fetching.
  • When returning a buffer to the pool, attach the relevant block hash so the data can be reused by a future GET request.
  • When getting a buffer from the pool to service a GET request, first try to get a buffer that already has data for the requested block. If that is available, use the cached data instead of reading it from a volume.
  • When getting a buffer for a PUT request, or a GET request for un-cached data, allocate a new one or (if max buffers are already allocated) use the least recently used buffer.
  • As before, if all buffers are already in use, wait until one is available or (after reaching max clients) return 503.

Implementation ideas

Instead of a sync.Pool of byte slices, add a buffer type (containing a []byte, a reference counter, and whatever else is needed for synchronization) and use a []*buffer as an LRU list. Maintain a map[string]*buffer mapping hashes to buffers (some buffers won't have a corresponding hash, e.g., after errors).

Add a bufferPool type to manage buffer sharing and repurposing.
  • All overlapping GET requests for a given hash should share the same buffer.
  • If requests are sharing a buffer and the data isn't already present, one handler attempts to fetch the data while the others wait. If an error occurs, all of the waiting requests return an error.

Related issues

Related to Arvados - Feature #2960: Keepstore can stream GET and PUT requests using keep-gateway APIIn ProgressTom CleggActions
Actions

Also available in: Atom PDF