Keep » History » Version 3
Tom Clegg, 04/10/2013 10:21 PM
1 | 1 | Tom Clegg | h1. Keep |
---|---|---|---|
2 | 1 | Tom Clegg | |
3 | 2 | Tom Clegg | Keep is a distributed content-addressable storage system designed for high performance in I/O-bound cluster environments. |
4 | 2 | Tom Clegg | |
5 | 2 | Tom Clegg | Notable design goals and features include: |
6 | 2 | Tom Clegg | |
7 | 2 | Tom Clegg | * High scalability |
8 | 2 | Tom Clegg | * Node-level redundancy |
9 | 2 | Tom Clegg | * Maximum overall throughput in a busy cluster environment |
10 | 2 | Tom Clegg | * Maximum data bandwidth from client to disk |
11 | 2 | Tom Clegg | * Minimum transaction overhead |
12 | 2 | Tom Clegg | * Elimination of disk thrashing (commonly caused by multiple simultaneous readers) |
13 | 2 | Tom Clegg | * Client-controlled redundancy |
14 | 2 | Tom Clegg | |
15 | 2 | Tom Clegg | h2. Design |
16 | 2 | Tom Clegg | |
17 | 2 | Tom Clegg | The above goals are accomplished by the following design features. |
18 | 2 | Tom Clegg | |
19 | 2 | Tom Clegg | * Data is transferred directly between the client and the physical node where the disk is installed. |
20 | 2 | Tom Clegg | * Data collections are encoded in large (≤64 MiB) blocks to minimize short read/write operations. |
21 | 2 | Tom Clegg | * Each disk accepts only one block-read/write operation at a time. This prevents disk thrashing and maximizes total throughput when many clients compete for a disk. |
22 | 2 | Tom Clegg | * Storage redundancy is directly controlled, and can be easily verified, by the client simply by reading or writing a block of data on multiple nodes. |
23 | 3 | Tom Clegg | * Data block distribution is computed based on a cryptographic digest of the data block being stored or retrieved. This eliminates the need for a central or synchronized database of block storage locations. |
24 | 2 | Tom Clegg | |
25 | 2 | Tom Clegg | h2. Components |
26 | 2 | Tom Clegg | |
27 | 1 | Tom Clegg | The Keep storage system consists of data block read/write services, SDKs, and management agents. |
28 | 1 | Tom Clegg | |
29 | 1 | Tom Clegg | The responsibilities of the Keep service are: |
30 | 1 | Tom Clegg | |
31 | 1 | Tom Clegg | * Write data blocks |
32 | 3 | Tom Clegg | * When writing: ensure data integrity by comparing client-supplied cryptographic digest and data |
33 | 1 | Tom Clegg | * Read data blocks (subject to permission, which is determined by the system/metadata DB) |
34 | 1 | Tom Clegg | * Send read/write/error event logs to management agents |
35 | 1 | Tom Clegg | |
36 | 1 | Tom Clegg | The responsibilities of the SDK are: |
37 | 1 | Tom Clegg | |
38 | 1 | Tom Clegg | * When writing: split data into ≤64 MiB chunks |
39 | 1 | Tom Clegg | * When writing: encode directory trees as manifests |
40 | 1 | Tom Clegg | * When writing: write data to the desired number of nodes to achieve storage redundancy |
41 | 1 | Tom Clegg | * After writing: register a collection with Arvados |
42 | 1 | Tom Clegg | * When reading: parse manifests |
43 | 1 | Tom Clegg | * When reading: verify data integrity by comparing locator to MD5 digest of retrieved data |
44 | 3 | Tom Clegg | |
45 | 3 | Tom Clegg | The responsibilities of management agents are: |
46 | 3 | Tom Clegg | |
47 | 3 | Tom Clegg | * Verify validity of permission tokens |
48 | 3 | Tom Clegg | * Determine which blocks have higher or lower redundancy than required |
49 | 3 | Tom Clegg | * Monitor disk space and move or delete blocks as needed |
50 | 3 | Tom Clegg | * Collect per-user, per-group, per-node, and per-disk usage statistics |