Keep storage tiers » History » Version 5
Tom Clegg, 05/01/2017 07:17 PM
1 | 1 | Tom Clegg | h1. Keep storage tiers |
---|---|---|---|
2 | |||
3 | 2 | Tom Clegg | Typically, an Arvados cluster has access to multiple storage devices with different cost/performance trade-offs. |
4 | 1 | Tom Clegg | |
5 | Examples: |
||
6 | * Local SSD |
||
7 | * Local HDD |
||
8 | * Object storage service provided by cloud vendor |
||
9 | * Slower or less reliable object storage service provided by same cloud vendor |
||
10 | 2 | Tom Clegg | |
11 | Users should be able to specify a minimum storage tier for each collection. Arvados should ensure that every data block referenced by a collection is stored at the specified tier _or better_. |
||
12 | |||
13 | The cluster administrator should be able to specify a default tier, and assign a tier number to each storage device. |
||
14 | |||
15 | 3 | Tom Clegg | It should be possible to configure multiple storage devices at the same tier: for example, this allows blocks to be distributed more or less uniformly across several (equivalent) cloud storage buckets for performance reasons. |
16 | |||
17 | h1. Implementation (proposal) |
||
18 | |||
19 | Storage tier features (and implementation) are similar to replication-level features. |
||
20 | |||
21 | h2. Configuration |
||
22 | |||
23 | 5 | Tom Clegg | Each Keep volume has an integer parameter, "tier". Interpretation is site-specific, except that when M≤N, tier M can satisfy a requirement for tier N, i.e., smaller tier numbers are better. Some volume drivers are capable of discovering the tier number for a volume by inspecting the underlying storage device (e.g., a cloud storage bucket) but in all cases a sysadmin can specify a value. |
24 | 3 | Tom Clegg | |
25 | 5 | Tom Clegg | There is a site-wide default tier number which is used for collections that do not specify a desired tier. Typically this is tier 1. |
26 | 3 | Tom Clegg | |
27 | h2. Storing data at a non-default tier |
||
28 | |||
29 | Tools that write data to Keep should allow the caller to specify a storage tier. The desired tier is sent to Keep services as a header (X-Keep-Desired-Tier) with each write request. Keep services return an error when the data cannot be written to the requested tier (or better). |
||
30 | |||
31 | h2. Moving data between tiers |
||
32 | |||
33 | Each collection has an integer field, "tier_desired". If tier_desired is not null, all blocks referenced by the collection should be stored at the given tier (or better). |
||
34 | 1 | Tom Clegg | |
35 | Keep-balance tracks the maximum allowed tier for each block, and moves blocks between tiers as needed. The strategy is similar to fixing rendezvous probe order: if a block is stored at the wrong tier, a new copy is made at the correct tier; then, in a subsequent balancing operation, the redundant copy is detected and deleted. _This increases the danger of data loss due to races between concurrent keep-balance processes. Keep-balance should have a reliable way to detect/avoid concurrent balancing operations._ |
||
36 | 5 | Tom Clegg | |
37 | (Note: the following section uses the term "mount" to mean what the keepstore code base calls a "volume": i.e., an attachment of a storage device to a keepstore process.) |
||
38 | |||
39 | To facilitate tracking in keep-balance, keepstore must provide a way for keep-balance to see which blocks are stored on which mount points, and copy/delete blocks to/from specific mount points: |
||
40 | * A "mounts" request (@GET /mounts@) should return information about all currently mounted volumes, e.g., @{"UUID":"zzzzz-aaaaa-aaaabbbbccccddd","Tier":1,"ReadOnly":false,"DeviceID":"9febe660-c4e4-4db4-9f59-fbc9d559547c/keep"}@ ("DeviceID" is a string that can enable keep-balance to detect when multiple Keep mounts, possibly on multiple keepstore nodes, are using the same underlying storage device). |
||
41 | * A block-index request for a specific mount (@GET /mounts/zzzzz-aaaaa-aaaabbbbccccddd/blocks@) should return a list of blocks stored on that mount. |
||
42 | * An entry in a pull request may include a "MountUUID" field indicating which mount the new copy should be written to. |
||
43 | * An entry in a trash request may include a "MountUUID" field indicating which mount the block should be deleted from. |
||
44 | 3 | Tom Clegg | |
45 | h2. Reporting |
||
46 | |||
47 | 4 | Tom Clegg | After each rebalance operation, keep-balance logs a summary of discrepancies between actual and desired allocation of blocks to storage tiers. Examples: |
48 | * N blocks (M bytes) are stored at tier 3 but are referenced by collections at tier 2. |
||
49 | * N blocks (M bytes) are stored at tier 1 but are not referenced by any collections at tier T<2. |