Keep » History » Version 3

Tom Clegg, 04/10/2013 10:21 PM

1 1 Tom Clegg
h1. Keep
2 1 Tom Clegg
3 2 Tom Clegg
Keep is a distributed content-addressable storage system designed for high performance in I/O-bound cluster environments.
4 2 Tom Clegg
5 2 Tom Clegg
Notable design goals and features include:
6 2 Tom Clegg
7 2 Tom Clegg
* High scalability
8 2 Tom Clegg
* Node-level redundancy
9 2 Tom Clegg
* Maximum overall throughput in a busy cluster environment
10 2 Tom Clegg
* Maximum data bandwidth from client to disk
11 2 Tom Clegg
* Minimum transaction overhead
12 2 Tom Clegg
* Elimination of disk thrashing (commonly caused by multiple simultaneous readers)
13 2 Tom Clegg
* Client-controlled redundancy
14 2 Tom Clegg
15 2 Tom Clegg
h2. Design
16 2 Tom Clegg
17 2 Tom Clegg
The above goals are accomplished by the following design features.
18 2 Tom Clegg
19 2 Tom Clegg
* Data is transferred directly between the client and the physical node where the disk is installed.
20 2 Tom Clegg
* Data collections are encoded in large (≤64 MiB) blocks to minimize short read/write operations.
21 2 Tom Clegg
* Each disk accepts only one block-read/write operation at a time. This prevents disk thrashing and maximizes total throughput when many clients compete for a disk.
22 2 Tom Clegg
* Storage redundancy is directly controlled, and can be easily verified, by the client simply by reading or writing a block of data on multiple nodes.
23 3 Tom Clegg
* Data block distribution is computed based on a cryptographic digest of the data block being stored or retrieved. This eliminates the need for a central or synchronized database of block storage locations.
24 2 Tom Clegg
25 2 Tom Clegg
h2. Components
26 2 Tom Clegg
27 1 Tom Clegg
The Keep storage system consists of data block read/write services, SDKs, and management agents.
28 1 Tom Clegg
29 1 Tom Clegg
The responsibilities of the Keep service are:
30 1 Tom Clegg
31 1 Tom Clegg
* Write data blocks
32 3 Tom Clegg
* When writing: ensure data integrity by comparing client-supplied cryptographic digest and data
33 1 Tom Clegg
* Read data blocks (subject to permission, which is determined by the system/metadata DB)
34 1 Tom Clegg
* Send read/write/error event logs to management agents
35 1 Tom Clegg
36 1 Tom Clegg
The responsibilities of the SDK are:
37 1 Tom Clegg
38 1 Tom Clegg
* When writing: split data into ≤64 MiB chunks
39 1 Tom Clegg
* When writing: encode directory trees as manifests
40 1 Tom Clegg
* When writing: write data to the desired number of nodes to achieve storage redundancy
41 1 Tom Clegg
* After writing: register a collection with Arvados
42 1 Tom Clegg
* When reading: parse manifests
43 1 Tom Clegg
* When reading: verify data integrity by comparing locator to MD5 digest of retrieved data
44 3 Tom Clegg
45 3 Tom Clegg
The responsibilities of management agents are:
46 3 Tom Clegg
47 3 Tom Clegg
* Verify validity of permission tokens
48 3 Tom Clegg
* Determine which blocks have higher or lower redundancy than required
49 3 Tom Clegg
* Monitor disk space and move or delete blocks as needed
50 3 Tom Clegg
* Collect per-user, per-group, per-node, and per-disk usage statistics