Idea #6260
Updated by Tom Clegg over 9 years ago
Write one or more integration tests to verify that Data Manager's _existing_ delete functionality (i.e., deletes blocks that are unreferenced; does not do anything about overreplicated blocks) works as desired: * Verify that blocks not referenced in a collection are deleted from keepstore * Verify that all blocks referenced from collections, and all blocks newer than the block signature TTL, are never deleted from keepstore Minimal test, covering a miniature version of normal operation: # * bring up api and keepstore services (just like we do already in keepproxy_test.go) # * store some collections (with non-zero data) # * store some data blocks without referencing them in any collection # * back-date the block Mtimes so that keepstore will consider unreferenced data old enough to delete # * write some "transient" blocks as well, this time without back-dating their Mtimes # * get block index from all keepstores # * run data manager in "single run" mode # * wait for all keepstores to finish working their trash and pull lists (i.e., /status.json reports @status["PullQueue"]["Queued"]==0 && status["PullQueue"]["InProgress"]==0 && ...@) # * get block index from all keepstores, make sure nothing has been deleted except the back-dated unreferenced blocks # * make API calls to delete some of the collections # * reduce replication on some of the collections # * run data manager again in "single run" mode # * wait for all keepstores to finish working their trash and pull lists # * get block index from all keepstores, make sure: #* ** all blocks appearing in non-deleted collections were not deleted #* ** all non-recent blocks appearing only in deleted collections were deleted Along the way, the test suite must confirm that the test data includes * some blocks that appear in at least one "non-deleted" and at least one "deleted" collection * some blocks that appear in at least one "deleted" collection and no "non-deleted" collections, and are recent (i.e., written in the "transient" step, and therefore are not garbage) * some blocks that never appeared in any collections, but are recent * some blocks that never appeared in any collections, and are not recent * of course, some blocks that actually get garbage-collected (i.e., the "garbage" set must not be empty!) _Ideally,_ the assessment of whether a block has been "deleted" should compare desired to actual replication level -- this way, the test won't start failing when data manager starts deleting some copies of over-replicated non-garbage blocks. But if this part threatens to be non-trivial, we should defer: better to get the issue at hand done, and adjust when needed. When this is done and we are satisfied it's effective, keepstore will no longer need to force @never_delete=true@. See #6221