Idea #6260
Updated by Tom Clegg over 9 years ago
Write one or more integration tests to verify that Data Manager's _existing_ delete functionality (i.e., deletes blocks that are unreferenced; does not do anything about overreplicated blocks) works as desired: * Verify that blocks not referenced in a collection are deleted from keepstore * Verify that all blocks referenced from collections, and all blocks newer than the block signature TTL, are never deleted from keepstore Minimal test, covering a miniature version of normal operation: # bring up api and keepstore services (just like we do already in keepproxy_test.go) # store some collections (with non-zero data) # store some data blocks without referencing them in any collection # back-date the block Mtimes so that keepstore will consider unreferenced data old enough to delete # write some "transient" blocks as well, this time without back-dating their Mtimes # get block index from all keepstores # run data manager in "single run" mode # wait for all keepstores to finish working their trash and pull lists (i.e., /status.json reports @status["PullQueue"]["Queued"]==0 && status["PullQueue"]["InProgress"]==0 && ...@) # get block index from all keepstores, make sure nothing has been deleted except the blocks that are back-dated _and_ unreferenced blocks # make API calls to delete some of the collections # reduce replication on some of the collections # run data manager again in "single run" mode # wait for all keepstores to finish working their trash and pull lists # get block index from all keepstores, make sure: #* all blocks appearing in non-deleted collections were not deleted #* all non-recent blocks appearing only in deleted collections were deleted Along the way, the test suite must confirm that the test data includes * some blocks that appear in at least one "non-deleted" and at least one "deleted" collection * some blocks that appear in at least one "deleted" collection and no "non-deleted" collections, and are recent (i.e., written in the "transient" step, and therefore are not garbage) * some blocks that never appeared in any collections, but are recent * some blocks that never appeared in any collections, and are not recent * of course, some blocks that actually get garbage-collected (i.e., the "garbage" set must not be empty!) _Ideally,_ the assessment of whether a block has been "deleted" should compare desired to actual replication level -- this way, the test won't start failing when data manager starts deleting some copies of over-replicated non-garbage blocks. But if this part threatens to be non-trivial, we should defer: better to get the issue at hand done, and adjust when needed. When this is done and we are satisfied it's effective, keepstore will no longer need to force @never_delete=true@. See #6221