Project

General

Profile

Keep server » History » Version 6

Tim Pierce, 04/04/2014 01:49 PM

1 5 Tim Pierce
h1. Keep 2.0
2 1 Tom Clegg
3 5 Tim Pierce
This page specifies a design for version 2.0 of the the Keep backing store server component, keepd.
4 1 Tom Clegg
5
{{toc}}
6
7
See also:
8
* [[Keep manifest format]]
9
* [[Keep index]]
10 5 Tim Pierce
* source:services/keep (implementation: in progress)
11 1 Tom Clegg
12 5 Tim Pierce
h2. Design Goals
13
14
h3. Content-addressible storage
15
16 6 Tim Pierce
Keep implements a "content-addressable filesystem.":http://en.wikipedia.org/wiki/Content-addressable_storage  An object stored in Keep is identified by a hash of its content; it is not possible for two objects in Keep to have the same content but different identifiers.
17 5 Tim Pierce
18
h3. Fault tolerance
19
20
Keep double-checks the content hash of an object on both reads and writes, to protect against data corruption on the network or on disk.
21
22 4 Tom Clegg
h2. Todo
23
24 5 Tim Pierce
* Implement server daemon (*in progress*)
25
* Implement integration test suite (*in progress*)
26 4 Tom Clegg
* Spec public/private key format and deployment mechanism
27
* Spec permission signature format
28
* Spec event-reporting API
29
* Spec quota mechanism
30
31 2 Tom Clegg
h2. Responsibilities
32
33
* Read and write blobs on disk
34
* Enforce maximum blob size
35
* Enforce key=hash(value) during read and write
36
* Enforce permissions when reading data (according to permissions on Collections in the metadata DB)
37
* Enforce usage quota when writing data
38
* Delete blobs (only when requested by data manager!)
39
* Report read/write/exception events
40
* Report free space
41 3 Tom Clegg
* Report hardware status (SMART)
42 2 Tom Clegg
43
h2. Other parties
44
45
* Client distributes data across the available Keep servers (using the content hash)
46
* Client attains initial replication level when writing blobs (by writing to multiple Keep servers)
47
* Data manager decides which blobs to delete (e.g., garbage collection, rebalancing)
48
49 1 Tom Clegg
h2. Discovering Keep server URIs
50
51
* @GET https://endpoint/arvados/v1/keep_disks@
52
* see http://doc.arvados.org/api/schema/KeepDisk.html
53
* Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.)
54
55
h2. Supported methods
56
57
For storage clients
58
* GET /hash
59
* GET /hash?checksum=true → verify checksum before sending
60
* POST / (body=content) → hash
61
* PUT /hash (body=content) → hash
62
* HEAD /hash → does it exist here?
63
* HEAD /hash?checksum=true → read the data and verify checksum
64
65
For system (monitoring, indexing, garbage collection)
66
* DELETE /hash → delete all copies of this blob (requires privileged token!)
67 4 Tom Clegg
* GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token)
68
* GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token)
69 1 Tom Clegg
70
h2. Authentication
71
72
* Client provides API token in Authorization header
73
* Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1)
74
75
h2. Permission
76
77
A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block.
78
79
The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures).
80
81
Writing:
82
* If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator.
83
* The API server @collections.create@ method verifies signatures before giving the current user can_read permission on the collection.
84 4 Tom Clegg
* A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling @collections.create@ followed by @collections.get@, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection will take care of them eventually; the only odd side effect is the existence of partial manifests. *(Should there be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)*
85 1 Tom Clegg
86
Reading:
87
* The API server @collections.get@ method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a @+Asignature@expirytime@ portion on each blob locator.
88
* Keep server verifies signatures before honoring @GET@ requests.
89
* The signature might come from either the Keep node itself, a different Keep node, or the API server.
90
* A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via @collections.get@.