Keep server » History » Version 6
Tim Pierce, 04/04/2014 01:49 PM
1 | 5 | Tim Pierce | h1. Keep 2.0 |
---|---|---|---|
2 | 1 | Tom Clegg | |
3 | 5 | Tim Pierce | This page specifies a design for version 2.0 of the the Keep backing store server component, keepd. |
4 | 1 | Tom Clegg | |
5 | {{toc}} |
||
6 | |||
7 | See also: |
||
8 | * [[Keep manifest format]] |
||
9 | * [[Keep index]] |
||
10 | 5 | Tim Pierce | * source:services/keep (implementation: in progress) |
11 | 1 | Tom Clegg | |
12 | 5 | Tim Pierce | h2. Design Goals |
13 | |||
14 | h3. Content-addressible storage |
||
15 | |||
16 | 6 | Tim Pierce | Keep implements a "content-addressable filesystem.":http://en.wikipedia.org/wiki/Content-addressable_storage An object stored in Keep is identified by a hash of its content; it is not possible for two objects in Keep to have the same content but different identifiers. |
17 | 5 | Tim Pierce | |
18 | h3. Fault tolerance |
||
19 | |||
20 | Keep double-checks the content hash of an object on both reads and writes, to protect against data corruption on the network or on disk. |
||
21 | |||
22 | 4 | Tom Clegg | h2. Todo |
23 | |||
24 | 5 | Tim Pierce | * Implement server daemon (*in progress*) |
25 | * Implement integration test suite (*in progress*) |
||
26 | 4 | Tom Clegg | * Spec public/private key format and deployment mechanism |
27 | * Spec permission signature format |
||
28 | * Spec event-reporting API |
||
29 | * Spec quota mechanism |
||
30 | |||
31 | 2 | Tom Clegg | h2. Responsibilities |
32 | |||
33 | * Read and write blobs on disk |
||
34 | * Enforce maximum blob size |
||
35 | * Enforce key=hash(value) during read and write |
||
36 | * Enforce permissions when reading data (according to permissions on Collections in the metadata DB) |
||
37 | * Enforce usage quota when writing data |
||
38 | * Delete blobs (only when requested by data manager!) |
||
39 | * Report read/write/exception events |
||
40 | * Report free space |
||
41 | 3 | Tom Clegg | * Report hardware status (SMART) |
42 | 2 | Tom Clegg | |
43 | h2. Other parties |
||
44 | |||
45 | * Client distributes data across the available Keep servers (using the content hash) |
||
46 | * Client attains initial replication level when writing blobs (by writing to multiple Keep servers) |
||
47 | * Data manager decides which blobs to delete (e.g., garbage collection, rebalancing) |
||
48 | |||
49 | 1 | Tom Clegg | h2. Discovering Keep server URIs |
50 | |||
51 | * @GET https://endpoint/arvados/v1/keep_disks@ |
||
52 | * see http://doc.arvados.org/api/schema/KeepDisk.html |
||
53 | * Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.) |
||
54 | |||
55 | h2. Supported methods |
||
56 | |||
57 | For storage clients |
||
58 | * GET /hash |
||
59 | * GET /hash?checksum=true → verify checksum before sending |
||
60 | * POST / (body=content) → hash |
||
61 | * PUT /hash (body=content) → hash |
||
62 | * HEAD /hash → does it exist here? |
||
63 | * HEAD /hash?checksum=true → read the data and verify checksum |
||
64 | |||
65 | For system (monitoring, indexing, garbage collection) |
||
66 | * DELETE /hash → delete all copies of this blob (requires privileged token!) |
||
67 | 4 | Tom Clegg | * GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token) |
68 | * GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token) |
||
69 | 1 | Tom Clegg | |
70 | h2. Authentication |
||
71 | |||
72 | * Client provides API token in Authorization header |
||
73 | * Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1) |
||
74 | |||
75 | h2. Permission |
||
76 | |||
77 | A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block. |
||
78 | |||
79 | The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures). |
||
80 | |||
81 | Writing: |
||
82 | * If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator. |
||
83 | * The API server @collections.create@ method verifies signatures before giving the current user can_read permission on the collection. |
||
84 | 4 | Tom Clegg | * A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling @collections.create@ followed by @collections.get@, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. *(Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)* |
85 | 1 | Tom Clegg | |
86 | Reading: |
||
87 | * The API server @collections.get@ method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a @+Asignature@expirytime@ portion on each blob locator. |
||
88 | * Keep server verifies signatures before honoring @GET@ requests. |
||
89 | * The signature might come from either the Keep node itself, a different Keep node, or the API server. |
||
90 | * A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via @collections.get@. |