Keep server » History » Revision 5
« Previous |
Revision 5/13
(diff)
| Next »
Tim Pierce, 04/04/2014 01:42 PM
Keep 2.0¶
This page specifies a design for version 2.0 of the the Keep backing store server component, keepd.
- Table of contents
- Keep 2.0
- Keep manifest format
- Keep index
- source:services/keep (implementation: in progress)
Design Goals¶
Content-addressible storage¶
Keep implements a http://en.wikipedia.org/wiki/Content-addressable_storage:"content-addressable filesystem.". An object stored in Keep is identified by a hash of its content; it is not possible for two objects in Keep to have the same content but different identifiers.
Fault tolerance¶
Keep double-checks the content hash of an object on both reads and writes, to protect against data corruption on the network or on disk.
Todo¶
- Implement server daemon (in progress)
- Implement integration test suite (in progress)
- Spec public/private key format and deployment mechanism
- Spec permission signature format
- Spec event-reporting API
- Spec quota mechanism
Responsibilities¶
- Read and write blobs on disk
- Enforce maximum blob size
- Enforce key=hash(value) during read and write
- Enforce permissions when reading data (according to permissions on Collections in the metadata DB)
- Enforce usage quota when writing data
- Delete blobs (only when requested by data manager!)
- Report read/write/exception events
- Report free space
- Report hardware status (SMART)
Other parties¶
- Client distributes data across the available Keep servers (using the content hash)
- Client attains initial replication level when writing blobs (by writing to multiple Keep servers)
- Data manager decides which blobs to delete (e.g., garbage collection, rebalancing)
Discovering Keep server URIs¶
GET https://endpoint/arvados/v1/keep_disks
- see http://doc.arvados.org/api/schema/KeepDisk.html
- Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.)
Supported methods¶
For storage clients- GET /hash
- GET /hash?checksum=true → verify checksum before sending
- POST / (body=content) → hash
- PUT /hash (body=content) → hash
- HEAD /hash → does it exist here?
- HEAD /hash?checksum=true → read the data and verify checksum
- DELETE /hash → delete all copies of this blob (requires privileged token!)
- GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token)
- GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token)
Authentication¶
- Client provides API token in Authorization header
- Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1)
Permission¶
A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block.
The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures).
Writing:- If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator.
- The API server
collections.create
method verifies signatures before giving the current user can_read permission on the collection. - A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling
collections.create
followed bycollections.get
, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. (Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)
- The API server
collections.get
method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a+Asignature@expirytime
portion on each blob locator. - Keep server verifies signatures before honoring
GET
requests. - The signature might come from either the Keep node itself, a different Keep node, or the API server.
- A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via
collections.get
.
Updated by Tim Pierce over 10 years ago · 13 revisions