Project

General

Profile

Actions

Keep server » History » Revision 6

« Previous | Revision 6/13 (diff) | Next »
Tim Pierce, 04/04/2014 01:49 PM


Keep 2.0

This page specifies a design for version 2.0 of the the Keep backing store server component, keepd.

See also:

Design Goals

Content-addressible storage

Keep implements a content-addressable filesystem. An object stored in Keep is identified by a hash of its content; it is not possible for two objects in Keep to have the same content but different identifiers.

Fault tolerance

Keep double-checks the content hash of an object on both reads and writes, to protect against data corruption on the network or on disk.

Todo

  • Implement server daemon (in progress)
  • Implement integration test suite (in progress)
  • Spec public/private key format and deployment mechanism
  • Spec permission signature format
  • Spec event-reporting API
  • Spec quota mechanism

Responsibilities

  • Read and write blobs on disk
  • Enforce maximum blob size
  • Enforce key=hash(value) during read and write
  • Enforce permissions when reading data (according to permissions on Collections in the metadata DB)
  • Enforce usage quota when writing data
  • Delete blobs (only when requested by data manager!)
  • Report read/write/exception events
  • Report free space
  • Report hardware status (SMART)

Other parties

  • Client distributes data across the available Keep servers (using the content hash)
  • Client attains initial replication level when writing blobs (by writing to multiple Keep servers)
  • Data manager decides which blobs to delete (e.g., garbage collection, rebalancing)

Discovering Keep server URIs

Supported methods

For storage clients
  • GET /hash
  • GET /hash?checksum=true → verify checksum before sending
  • POST / (body=content) → hash
  • PUT /hash (body=content) → hash
  • HEAD /hash → does it exist here?
  • HEAD /hash?checksum=true → read the data and verify checksum
For system (monitoring, indexing, garbage collection)
  • DELETE /hash → delete all copies of this blob (requires privileged token!)
  • GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token)
  • GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token)

Authentication

  • Client provides API token in Authorization header
  • Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1)

Permission

A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block.

The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures).

Writing:
  • If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator.
  • The API server collections.create method verifies signatures before giving the current user can_read permission on the collection.
  • A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling collections.create followed by collections.get, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. (Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)
Reading:
  • The API server collections.get method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a +Asignature@expirytime portion on each blob locator.
  • Keep server verifies signatures before honoring GET requests.
  • The signature might come from either the Keep node itself, a different Keep node, or the API server.
  • A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via collections.get.

Updated by Tim Pierce almost 10 years ago · 6 revisions