Keep storage classes » History » Version 15
Nico César, 02/16/2021 07:37 PM
1 | 12 | Tom Clegg | h1. Keep storage classes |
---|---|---|---|
2 | 1 | Tom Clegg | |
3 | 14 | Nico César | -Top level ticket for implementation: https://dev.arvados.org/issues/11184- |
4 | |||
5 | Epic as of 2021-02-16: https://dev.arvados.org/issues/16107 |
||
6 | 13 | Tom Morris | |
7 | 2 | Peter Amstutz | h2. Use cases |
8 | |||
9 | 7 | Peter Amstutz | * Partition data among several cloud buckets for legal or financial reasons. |
10 | * Shift data from "hot" to "cool" storage (e.g. SSD to disk) for price/performance tradeoff. |
||
11 | * Move data from on-line to off-line storage (e.g. Glacier) but maintain provenance. |
||
12 | 6 | Peter Amstutz | |
13 | 1 | Tom Clegg | h2. Requirements |
14 | |||
15 | 12 | Tom Clegg | * arv-put & arv-copy have option to specify storage class. |
16 | ** When writing each block, client can specify storage class for the block. |
||
17 | ** Use API to specify that the blocks belonging to a collection should go into a certain storage class. |
||
18 | * Workbench permits changing storage class on a collection |
||
19 | * arvados-cwl-runner has options to specify storage classes for intermediate and final output collections. |
||
20 | ** Use API to specify the storage class for the output collection of a container request. |
||
21 | * TBD: access controls on storage classes, can restrict which users can place collections in which storage class? |
||
22 | * TBD: rules for de-duplicating blocks across classes? (e.g., if collections with identical data exist in "hot" & "cool" classes, do we really need a copy of the data in "cool" as well as the copy in "hot"?) |
||
23 | 1 | Tom Clegg | |
24 | h2. Design |
||
25 | |||
26 | 12 | Tom Clegg | A "storage class" is effectively a tagging scheme to specify a group of keep volumes where a block should be preferentially stored. |
27 | 1 | Tom Clegg | |
28 | 12 | Tom Clegg | Generalized from [[Keep storage tiers]] (but unlike storage tiers proposal, there is no implied price/performance relationship between classes). |
29 | 5 | Peter Amstutz | |
30 | 12 | Tom Clegg | Each keepstore service has access to one or more storage classes. Storage classes are independent. Data assigned to a class may still be sharded among multiple servers. Classes are be identified with labels or uuids instead of integers. The keep services table adds a column which lists which classes are available at which services. |
31 | 1 | Tom Clegg | |
32 | 12 | Tom Clegg | When writing blocks, keepstore recognizes a header @X-Keep-Storage-Classes@ and accepts or denies the block based on whether it can place the block in the designated classes. If not supplied, keepstores should have a default pool. The value of @X-Keep-Storage-Classes@ should be reported in the response. |
33 | 1 | Tom Clegg | |
34 | 12 | Tom Clegg | Each keepstore volume (mount) is associated with a number of storage classes. |
35 | 1 | Tom Clegg | |
36 | 12 | Tom Clegg | Collections may specify a desired set of classes for the blocks in the collection. Keep balance should move blocks to volumes that offer the desired classes. If multiple collections reference the same block and different sets of classes, multiple copies may be required. |
37 | 1 | Tom Clegg | |
38 | 12 | Tom Clegg | Data management policies, such as "move data from hot storage to cool storage after 1 month", should be implemented on top of the keepstore layer with additional tooling/scripts that set storage classes on collections. |
39 | 1 | Tom Clegg | |
40 | 12 | Tom Clegg | Storage classes could be used for moving data into long-term storage (e.g. Glacier, tape backup, etc). As an example, the user would change the storage class to "glacier", which would copy the blocks into offline storage and delete them from the online storage. To retrieve the blocks, the user would change the storage class to "s3". This would fetch the blocks and copy them back to online storage. (TBD: how does the client find out when the data actually becomes available.) |
41 | 8 | Peter Amstutz | |
42 | h2. Development tasks |
||
43 | |||
44 | 15 | Nico César | See tickets in #16107 |