Keep index

See also: Purposes of index:
  • Tell garbage collector what is eligible for deletion (and some partial order of preference)
  • Tell replication enforcer which blocks should be stored how many × (and in which [types of] backing store)
  • Tell rebalancer which blocks should be moved to redistribute free space and reduce probe time
  • Tell managers how much disk space is being conserved due to CAS
  • Tell managers how much disk space is occupied in a given backing store service
  • Tell managers how disk usage would be affected by modifying storage policy
  • Tell users how much disk space is represented by a given set of collections
  • Tell users how much disk space can be made available by garbage collection
  • Tell users how soon they should expect their cached data to disappear
  • Tell users performance statistics (how fast should I expect my job to read data?)
  • Tell ops where each block was most recently read/written, in case data recovery is needed
  • Tell ops how unbalanced the backing stores are across the cluster
  • Tell ops activity level and performance statistics
  • Tell ops activity level vs. amount of space (how much of the data is being accessed by users?)
  • Tell ops disk performance/error/status trends to help identify bad hardware
Basic kinds of data in the index:
  • Which blocks are used by which collections (and which collections are valued by which users/groups)
  • Which blocks are stored on which disks
  • Which disks are attached to which nodes
  • Read events
  • Write events
  • Exceptions (checksum mismatch, IO error)

Implementation considerations

Overview
  • REST service
  • API server may cache/proxy some queries
  • API server may redirect some queries
Permissions
  • Support +A tokens like Keep server when accepting collection/blob uuids in request?
  • Require admin api_token for some queries, site-configurable?
Distributed/asynchronous
  • Easy to run multiple keep index services.
  • Most features do not need synchronous operation / real time data.
  • Features that move or delete data should be tied to a single "primary" indexing service (failover event likely requires resetting some state).
  • Substantial disagreement between multiple index services should be easy to flag on admin dashboard.