Keep server » History » Revision 6
Revision 5 (Tim Pierce, 04/04/2014 01:42 PM) → Revision 6/13 (Tim Pierce, 04/04/2014 01:49 PM)
h1. Keep 2.0 This page specifies a design for version 2.0 of the the Keep backing store server component, keepd. {{toc}} See also: * [[Keep manifest format]] * [[Keep index]] * source:services/keep (implementation: in progress) h2. Design Goals h3. Content-addressible storage Keep implements a "content-addressable filesystem.":http://en.wikipedia.org/wiki/Content-addressable_storage http://en.wikipedia.org/wiki/Content-addressable_storage:"content-addressable filesystem.". An object stored in Keep is identified by a hash of its content; it is not possible for two objects in Keep to have the same content but different identifiers. h3. Fault tolerance Keep double-checks the content hash of an object on both reads and writes, to protect against data corruption on the network or on disk. h2. Todo * Implement server daemon (*in progress*) * Implement integration test suite (*in progress*) * Spec public/private key format and deployment mechanism * Spec permission signature format * Spec event-reporting API * Spec quota mechanism h2. Responsibilities * Read and write blobs on disk * Enforce maximum blob size * Enforce key=hash(value) during read and write * Enforce permissions when reading data (according to permissions on Collections in the metadata DB) * Enforce usage quota when writing data * Delete blobs (only when requested by data manager!) * Report read/write/exception events * Report free space * Report hardware status (SMART) h2. Other parties * Client distributes data across the available Keep servers (using the content hash) * Client attains initial replication level when writing blobs (by writing to multiple Keep servers) * Data manager decides which blobs to delete (e.g., garbage collection, rebalancing) h2. Discovering Keep server URIs * @GET https://endpoint/arvados/v1/keep_disks@ * see http://doc.arvados.org/api/schema/KeepDisk.html * Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.) h2. Supported methods For storage clients * GET /hash * GET /hash?checksum=true → verify checksum before sending * POST / (body=content) → hash * PUT /hash (body=content) → hash * HEAD /hash → does it exist here? * HEAD /hash?checksum=true → read the data and verify checksum For system (monitoring, indexing, garbage collection) * DELETE /hash → delete all copies of this blob (requires privileged token!) * GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token) * GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token) h2. Authentication * Client provides API token in Authorization header * Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1) h2. Permission A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block. The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures). Writing: * If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator. * The API server @collections.create@ method verifies signatures before giving the current user can_read permission on the collection. * A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling @collections.create@ followed by @collections.get@, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. *(Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)* Reading: * The API server @collections.get@ method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a @+Asignature@expirytime@ portion on each blob locator. * Keep server verifies signatures before honoring @GET@ requests. * The signature might come from either the Keep node itself, a different Keep node, or the API server. * A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via @collections.get@.