Keep server

This page describes the Keep backing store server component, keepd.

See also:

Todo

  • Spec private key deployment mechanism
  • Spec event-reporting API
  • Spec quota mechanism

Responsibilities

  • Read and write blobs on disk
  • Remember when each blob was last written1
  • Enforce maximum blob size
  • Enforce key=hash(value) during read and write
  • Enforce permissions when reading data (according to permissions on Collections in the metadata DB)
  • Enforce usage quota when writing data
  • Delete blobs (only when requested by data manager!)
  • Report read/write/exception events
  • Report used & free space
  • Report hardware status (SMART)
  • Report list of blobs on disk (hash, size, time last stored)

1 This helps with garbage collection. Re-writing an already-stored blob should push it to the back of the garbage collection queue. Ordering garbage collection this way provides a fair and more or less predictable interval between write (from the client's perspective) and earliest potential deletion.

Other parties

  • Client distributes data across the available Keep servers (using the content hash)
  • Client attains initial replication level when writing blobs (by writing to multiple Keep servers)
  • Data manager decides which blobs to delete (e.g., garbage collection, rebalancing)

Discovering Keep server URIs

Supported methods

For storage clients
  • GET /hash
  • GET /hash?checksum=true → verify checksum before sending
  • PUT /hash (body=content) → hash
  • HEAD /hash → does it exist here?
  • HEAD /hash?checksum=true → read the data and verify checksum
For system (monitoring, indexing, garbage collection)
  • DELETE /hash → delete all copies of this blob (requires privileged token!)
  • GET /index.txt → get full list of blocks stored here, including size and unix timestamp of most recent PUT (requires privileged token)
  • GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token)

Example index.txt:

37b51d194a7513e45b56f6524f2d51f2+3 1396976219
acbd18db4cc2f85cedef654fccc4a4d8+3 1396976187

Example status.json:

{
 "volumes":[
  {"mount_point":"/data/disk0","bytes_free":4882337792,"bytes_used":5149708288},
  {"mount_point":"/data/disk1","bytes_free":39614472192,"bytes_used":3314229248}
 ]
}

Authentication

  • Client provides API token in Authorization header
  • Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1)

Permission

A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block.

The controller and each Keep server has a private key.

Writing:
  • If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator.
  • The API server collections.create method verifies signatures before giving the current user can_read permission on the collection.
  • A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling collections.create followed by collections.get, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. (Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)
Reading:
  • The API server collections.get method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a +Asignature@expirytime portion on each blob locator.
  • Keep server verifies signatures before honoring GET requests.
  • The signature might come from either the Keep node itself, a different Keep node, or the API server.
  • A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via collections.get.