Keep data block life cycle » History » Version 1
Tom Clegg, 05/08/2019 06:26 PM
1 | 1 | Tom Clegg | h1. Keep data block life cycle |
---|---|---|---|
2 | |||
3 | Unlike most storage systems, Keep uses content-addressed data blocks, so usage of back-end storage does not correspond directly to the amount of data being stored at any given moment. |
||
4 | |||
5 | If a single data block is referenced by 7 collections with desired replication 1, and by 7 other collections with desired replication 2: |
||
6 | * two copies of the block are stored |
||
7 | * deleting one of the 14 collections has no impact on back-end usage |
||
8 | |||
9 | Even when all 14 collections are deleted, the unreferenced data block cannot necessarily be deleted right away to free up storage space: |
||
10 | * Recently-written blocks cannot be garbage collected because a client might have a reference in memory and use it to create a collection (or read the data back). |
||
11 | * Blocks referenced by recently-retrieved collections cannot be garbage collected, for the same reason. |
||
12 | * Keep servers use a "trashed" state to accommodate eventually-consistent backend behavior (AWS S3) and to provide a safety net for recovering data that was deleted prematurely due to a bug or configuration problem. |
||
13 | |||
14 | For example, with TrashLifetime = 10d and BlobSignatureTTL = 10d, it takes at least 20d to recover the space used by a block -- starting at the last time the data was written to Keep by a client _or_ referenced in a collection. |
||
15 | |||
16 | <pre> |
||
17 | time (days) keep0 keep1 client api/db comment |
||
18 | |||
19 | ------------ ------------ ------------ ------------ ------------ |
||
20 | |||
21 | +0 write B1 write B1 |
||
22 | |||
23 | +1 write B1 |
||
24 | |||
25 | +2 write B1 |
||
26 | |||
27 | +3 create collection C1 referencing B1 |
||
28 | |||
29 | +4 trash collection C1 ("trash_at=now", which implies "delete_at=+10d") |
||
30 | |||
31 | +5 write B1 |
||
32 | |||
33 | +13 (no action) 10d (blob signature TTL) since last write on keep0, but still referenced by C1 |
||
34 | |||
35 | +14 collection C1 expires automatically |
||
36 | trash B1 10d (blob signature TTL) since last write on keep0 |
||
37 | |||
38 | +15 trash B1 10d (blob signature TTL) since last write on keep1 |
||
39 | |||
40 | +24 delete B1 10d (trash lifetime) since trashed on keep0 |
||
41 | |||
42 | +25 delete B1 10d (trash lifetime) since trashed on keep1 |
||
43 | </pre> |