Project

General

Profile

Keep server » History » Version 1

Tom Clegg, 02/04/2014 01:40 AM

1 1 Tom Clegg
h1. Keep server
2
3
This page describes the Keep backing store server component, keepd.
4
5
{{toc}}
6
7
See also:
8
* [[Keep manifest format]]
9
* [[Keep index]]
10
* source:services/keepd (implementation: imminent)
11
12
h2. Discovering Keep server URIs
13
14
* @GET https://endpoint/arvados/v1/keep_disks@
15
* see http://doc.arvados.org/api/schema/KeepDisk.html
16
* Currently "list of Keep servers" is "list of unique {host,port} across all Keep disks". (Could surely be improved.)
17
18
h2. Supported methods
19
20
For storage clients
21
* GET /hash
22
* GET /hash?checksum=true → verify checksum before sending
23
* POST / (body=content) → hash
24
* PUT /hash (body=content) → hash
25
* HEAD /hash → does it exist here?
26
* HEAD /hash?checksum=true → read the data and verify checksum
27
28
For system (monitoring, indexing, garbage collection)
29
* DELETE /hash → delete all copies of this blob (requires privileged token!)
30
* GET /index.txt → get full list of blocks stored here, including size [and whether it was PUT recently?] (requires privileged token?)
31
* GET /state.json → get list of backing filesystems, disk fullness, IO counters, perhaps recent IO statistics (requires privileged token?)
32
33
h2. Authentication
34
35
* Client provides API token in Authorization header
36
* Config knob to ignore authentication & permissions (for fully-shared site, and help transition from Keep1)
37
38
h2. Permission
39
40
A signature token, unique to a {blob_hash, arvados_api_token, expiry_time}, establishes permission to read a block.
41
42
The controller and each Keep server has a private key. Everyone can know the public keys (but only the controller and keep servers need to know them; clients don't need to verify signatures).
43
44
Writing:
45
* If the given hash and content agree, whether or not a disk write is required, Keep server creates a +Asignature@expirytime portion to the returned blob locator.
46
* The API server @collections.create@ method verifies signatures before giving the current user can_read permission on the collection.
47
* A suitably intelligent client can notice that the expirytimes on its blob hashes are getting old, and refresh them by generating a partial manifest, calling @collections.create@ followed by @collections.get@, and optionally deleting the partial manifest(s) when the full manifest is written. If extra partial manifests are left around, garbage collection should take care of them eventually; the only odd side effect is the existence of partial manifests. *(Should there just be a separate "refresh all of these tokens for me" API call to avoid creating these intermediate manifests?)*
48
49
Reading:
50
* The API server @collections.get@ method returns two manifests. One has plain hashes (this is the one whose content hash is the collection UUID). The other has a @+Asignature@expirytime@ portion on each blob locator.
51
* Keep server verifies signatures before honoring @GET@ requests.
52
* The signature might come from either the Keep node itself, a different Keep node, or the API server.
53
* A suitably intelligent client can notice that the expirytime on its blob hashes is too old, and request a fresh set via @collections.get@.