Idea #6260
Updated by Tom Clegg over 9 years ago
Write one or more integration tests to verify that Data Manager's _existing_ delete functionality (i.e., deletes blocks that are unreferenced; does not do anything about overreplicated blocks) works as desired:
* Verify that blocks not referenced in a collection are deleted from keepstore
* Verify that all blocks referenced from collections, and all blocks newer than the block signature TTL, are never deleted from keepstore
Minimal test, covering a miniature version of normal operation:
# bring up api and keepstore services (just like we do already in keepproxy_test.go)
# store some collections (with non-zero data)
# store some data blocks without referencing them in any collection
# back-date the block Mtimes so that keepstore will consider unreferenced data old enough to delete
# write some "transient" blocks as well, this time without back-dating their Mtimes
# get block index from all keepstores
# run data manager in "single run" mode
# wait for all keepstores to finish working their trash and pull lists (i.e., /status.json reports @status["PullQueue"]["Queued"]==0 && status["PullQueue"]["InProgress"]==0 && ...@)
# get block index from all keepstores, make sure nothing has been deleted except the blocks that are back-dated _and_ unreferenced
# make API calls to delete some of the collections
# reduce replication on some of the collections
# run data manager again in "single run" mode
# wait for all keepstores to finish working their trash and pull lists
# get block index from all keepstores, make sure:
#* all blocks appearing in non-deleted collections were not deleted
#* all non-recent blocks appearing only in deleted collections were deleted
Along the way, the test suite must confirm that the test data includes
* some blocks that appear in at least one "non-deleted" and at least one "deleted" collection
* some blocks that appear in at least one "deleted" collection and no "non-deleted" collections, and are recent (i.e., written in the "transient" step, and therefore are not garbage)
* some blocks that never appeared in any collections, but are recent
* some blocks that never appeared in any collections, and are not recent
* some blocks that do appear in collections, and are not recent (i.e., being referenced by a collection is the only reason they're not garbage)
* of course, some blocks that actually get garbage-collected (i.e., the "garbage" set must not be empty!)
_Ideally,_ the assessment of whether a block has been "deleted" should compare desired to actual replication level -- this way, the test won't start failing when data manager starts deleting some copies of over-replicated non-garbage blocks. But if this part threatens to be non-trivial, we should defer: better to get the issue at hand done, and adjust when needed.
When this is done and we are satisfied it's effective, keepstore will no longer need to force @never_delete=true@. See #6221