Project

General

Profile

Actions

Keep server » History » Revision 2

« Previous | Revision 2/13 (diff) | Next »
Tom Clegg, 02/14/2014 10:09 AM


Keep server

This page describes the Keep backing store server component, keepd.

See also:

Responsibilities

  • Read and write blobs on disk
  • Enforce maximum blob size
  • Enforce key=hash(value) during read and write
  • Enforce permissions when reading data (according to permissions on Collections in the metadata DB)
  • Enforce usage quota when writing data
  • Delete blobs (only when requested by data manager!)
  • Report read/write/exception events
  • Report free space

Other parties

  • Client distributes data across the available Keep servers (using the content hash)
  • Client attains initial replication level when writing blobs (by writing to multiple Keep servers)
  • Data manager decides which blobs to delete (e.g., garbage collection, rebalancing)

Discovering Keep server URIs

Supported methods

For storage clients
  • GET /hash
  • GET /hash?checksum=true → verify checksum before sending
  • POST / (body=content) → hash
  • PUT /hash (body=content) → hash
  • HEAD /hash → does it exist here?
  • HEAD /hash?checksum=true → read the data and verify checksum
For system (monitoring, indexing, garbage collection)
  • DELETE /hash → delete all copies of this blob (requires privileged token!)
  • GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token?)
  • GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token?)

Authentication

  • Client provides API token in Authorization header
  • Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1)

Permission

A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block.

The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures).

Writing:
  • If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator.
  • The API server collections.create method verifies signatures before giving the current user can_read permission on the collection.
  • A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling collections.create followed by collections.get, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection should take care of them eventually; the only odd side effect is the existence of partial manifests. (Should there just be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)
Reading:
  • The API server collections.get method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a +Asignature@expirytime portion on each blob locator.
  • Keep server verifies signatures before honoring GET requests.
  • The signature might come from either the Keep node itself, a different Keep node, or the API server.
  • A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via collections.get.

Updated by Tom Clegg about 10 years ago · 2 revisions