Project

General

Profile

Keep data block life cycle » History » Version 1

Tom Clegg, 05/08/2019 06:26 PM

1 1 Tom Clegg
h1. Keep data block life cycle
2
3
Unlike most storage systems, Keep uses content-addressed data blocks, so usage of back-end storage does not correspond directly to the amount of data being stored at any given moment.
4
5
If a single data block is referenced by 7 collections with desired replication 1, and by 7 other collections with desired replication 2:
6
* two copies of the block are stored
7
* deleting one of the 14 collections has no impact on back-end usage
8
9
Even when all 14 collections are deleted, the unreferenced data block cannot necessarily be deleted right away to free up storage space:
10
* Recently-written blocks cannot be garbage collected because a client might have a reference in memory and use it to create a collection (or read the data back).
11
* Blocks referenced by recently-retrieved collections cannot be garbage collected, for the same reason.
12
* Keep servers use a "trashed" state to accommodate eventually-consistent backend behavior (AWS S3) and to provide a safety net for recovering data that was deleted prematurely due to a bug or configuration problem.
13
14
For example, with TrashLifetime = 10d and BlobSignatureTTL = 10d, it takes at least 20d to recover the space used by a block -- starting at the last time the data was written to Keep by a client _or_ referenced in a collection.
15
16
<pre>
17
time (days)    keep0          keep1          client         api/db         comment
18
19
               ------------   ------------   ------------   ------------   ------------
20
21
+0             write B1       write B1
22
23
+1             write B1
24
25
+2             write B1
26
27
+3                                           create collection C1 referencing B1
28
29
+4                                           trash collection C1 ("trash_at=now", which implies "delete_at=+10d")
30
31
+5                            write B1
32
33
+13            (no action)                                                 10d (blob signature TTL) since last write on keep0, but still referenced by C1
34
35
+14                                                         collection C1 expires automatically
36
               trash B1                                                    10d (blob signature TTL) since last write on keep0
37
38
+15                           trash B1                                     10d (blob signature TTL) since last write on keep1
39
40
+24            delete B1                                                   10d (trash lifetime) since trashed on keep0
41
42
+25                           delete B1                                    10d (trash lifetime) since trashed on keep1
43
</pre>