Keep data block life cycle¶
Unlike most storage systems, Keep uses content-addressed data blocks, so usage of back-end storage does not correspond directly to the amount of data being stored at any given moment.If a single data block is referenced by 7 collections with desired replication 1, and by 7 other collections with desired replication 2:
- two copies of the block are stored
- deleting one of the 14 collections has no impact on back-end usage
- Recently-written blocks cannot be garbage collected because a client might have a reference in memory and use it to create a collection (or read the data back).
- Blocks referenced by recently-retrieved collections cannot be garbage collected, for the same reason.
- Keep servers use a "trashed" state to accommodate eventually-consistent backend behavior (AWS S3) and to provide a safety net for recovering data that was deleted prematurely due to a bug or configuration problem.
For example, with TrashLifetime = 10d and BlobSignatureTTL = 10d, it takes at least 20d to recover the space used by a block -- starting at the last time the data was written to Keep by a client or referenced in a collection.
time (days) keep0 keep1 client api/db comment ------------ ------------ ------------ ------------ ------------ +0 write B1 write B1 +1 write B1 +2 write B1 +3 create collection C1 referencing B1 +4 trash collection C1 ("trash_at=now", which implies "delete_at=+10d") +5 write B1 +13 (no action) 10d (blob signature TTL) since last write on keep0, but still referenced by C1 +14 collection C1 expires automatically trash B1 10d (blob signature TTL) since last write on keep0 +15 trash B1 10d (blob signature TTL) since last write on keep1 +24 delete B1 10d (trash lifetime) since trashed on keep0 +25 delete B1 10d (trash lifetime) since trashed on keep1
Updated by Tom Clegg almost 4 years ago · 1 revisions