Keep-balance » History » Version 4

Tom Clegg, 02/14/2014 10:22 AM

1 1 Tom Clegg
h1. Data Manager
2 1 Tom Clegg
3 4 Tom Clegg
The Data Manager enforces policies and generates reports about storage resource usage. The Data Manager interacts with the [[Keep server]] and the metadata database. Clients/users do not interact with the Data Manager directly.
4 1 Tom Clegg
5 4 Tom Clegg
See also:
6 4 Tom Clegg
* [[Keep server]]
7 4 Tom Clegg
* [[Keep manifest format]]
8 4 Tom Clegg
* source: n/a (design phase)
9 1 Tom Clegg
10 4 Tom Clegg
Responsibilities:
11 4 Tom Clegg
* Garbage collector: decide what is eligible for deletion (and some partial order of preference)
12 4 Tom Clegg
* Replication enforcer: copy and delete blobs in various backing stores to achieve desired replication level
13 4 Tom Clegg
* Rebalancer: move blobs to redistribute free space and reduce client probes
14 4 Tom Clegg
* Tell managers how much disk space is being conserved due to CAS
15 4 Tom Clegg
* Tell managers how much disk space is occupied in a given backing store service
16 4 Tom Clegg
* Tell managers how disk usage would be affected by modifying storage policy
17 4 Tom Clegg
* Tell managers how much disk space+time is used (per user, group, node, disk)
18 4 Tom Clegg
* Tell users when replication/policy specified for a collection is not currently satisfied (and why, for how long, etc) 
19 4 Tom Clegg
* Tell users how much disk space is represented by a given set of collections
20 4 Tom Clegg
* Tell users how much disk space can be made available by garbage collection
21 4 Tom Clegg
* Tell users how soon they should expect their cached data to disappear
22 4 Tom Clegg
* Tell users performance statistics (how fast should I expect my job to read data?)
23 4 Tom Clegg
* Tell ops where each block was most recently read/written, in case data recovery is needed
24 4 Tom Clegg
* Tell ops how unbalanced the backing stores are across the cluster
25 4 Tom Clegg
* Tell ops activity level and performance statistics
26 4 Tom Clegg
* Tell ops activity level vs. amount of space (how much of the data is being accessed by users?)
27 4 Tom Clegg
* Tell ops disk performance/error/status trends (and SMART reports) to help identify bad hardware
28 4 Tom Clegg
* Tell ops history of disk adds, removals, moves
29 4 Tom Clegg
30 4 Tom Clegg
Basic kinds of data in the index:
31 4 Tom Clegg
* Which blocks are used by which collections (and which collections are valued by which users/groups)
32 4 Tom Clegg
* Which blocks are stored on which disks
33 4 Tom Clegg
* Which disks are attached to which nodes
34 4 Tom Clegg
* Read events
35 4 Tom Clegg
* Write events
36 4 Tom Clegg
* Exceptions (checksum mismatch, IO error)
37 4 Tom Clegg
38 4 Tom Clegg
h2. Implementation considerations
39 4 Tom Clegg
40 4 Tom Clegg
Overview
41 4 Tom Clegg
* REST service
42 4 Tom Clegg
* API server may cache/proxy some queries
43 4 Tom Clegg
* API server may redirect some queries
44 4 Tom Clegg
45 4 Tom Clegg
Permissions
46 4 Tom Clegg
* Support +A tokens like [[Keep server]] when accepting collection/blob uuids in request?
47 4 Tom Clegg
* Require admin api_token for some queries, site-configurable?
48 4 Tom Clegg
49 4 Tom Clegg
Distributed/asynchronous
50 4 Tom Clegg
* Easy to run multiple keep index services.
51 4 Tom Clegg
* Most features do not need synchronous operation / real time data.
52 4 Tom Clegg
* Features that move or delete data should be tied to a single "primary" indexing service (failover event likely requires resetting some state).
53 4 Tom Clegg
* Substantial disagreement between multiple index services should be easy to flag on admin dashboard.