Project

General

Profile

Keep storage classes » History » Version 6

Peter Amstutz, 06/13/2017 11:57 PM

1 6 Peter Amstutz
h1. Keep storage groups
2 1 Tom Clegg
3 2 Peter Amstutz
h2. Use cases
4
5
* User has option to store some data in cheaper storage, but only certain data qualifies.  Can be indicated on a per-collection basis.
6 1 Tom Clegg
* User wants data moved from "hot" to "cool" storage a certain amount of time after it has been generated.
7 6 Peter Amstutz
8
h2. Requirements
9
10
* arv-put has option to specify storage group.
11
** When writing blocks, client can specify which storage group for the block.
12
** Use API to can specify that the blocks belonging to a collection should go into a certain storage group.
13
* Use API to can specify the storage group for the output collection of a container request.
14
** arvados-cwl-runner has options to specify storage groups for intermediate and final output collections.
15 2 Peter Amstutz
16 3 Peter Amstutz
h2. Design
17 2 Peter Amstutz
18 5 Peter Amstutz
A "pool" is effectively a tagging scheme to specify a subset of keep servers where a block should be preferentially stored.
19
20 3 Peter Amstutz
Related to (but not the same thing as) [[Keep storage tiers]]. For some use cases, the assumption of a roughly linear relationship between slow/cheap and fast/expensive doesn't necessarily hold.
21 1 Tom Clegg
22 3 Peter Amstutz
Each service has access to one or more storage pools.  Storage pools are independent.  There is no implied relationship between pools.  Data assigned to a pool may still be sharded among multiple servers.  Pools can be identified with labels or uuids instead of integers.  The keep services table adds a column which lists which pools are available at which services.
23 1 Tom Clegg
24
When writing blocks, keepstore recognizes a header @X-Keep-Pool@ and accepts or denies the block based on whether it can place the block in the designated pool.  If not supplied, keepstores should have a default pool.  The value of @X-Keep-Pool@ should be reported in the response.
25
26
A keepstore mount is associated with a specific pool.
27
28 4 Peter Amstutz
Collections may specify a desired pool for the blocks in the collection.  Keep balance should move blocks to the desired pool.  If multiple collections reference the same block in different pools, each pool should have a copy.
29 1 Tom Clegg
30
Data management policies, for example "move data from hot storage to cold storage if not accessed after 1 month", should be implemented with additional tooling/scripts on top of the keepstore later.