Project

General

Profile

Idea #6260

Updated by Tom Clegg over 8 years ago

Write one or more integration tests to verify that Data Manager's _existing_ delete functionality (i.e., deletes blocks that are unreferenced; does not do anything about overreplicated blocks) works as desired: 

 * Verify that blocks not referenced in a collection are deleted from keepstore  
 * Verify that all blocks referenced from collections, and all blocks newer than the block signature TTL, are never deleted from keepstore 

 Minimal test, covering a miniature version of normal operation: 
 # bring up api and keepstore services (just like we do already in keepproxy_test.go) 
 # store some collections (with non-zero data) 
 # store some data blocks without referencing them in any collection 
 # back-date the block Mtimes so that keepstore will consider unreferenced data old enough to delete 
 # write some "transient" blocks as well, this time without back-dating their Mtimes 
 # get block index from all keepstores 
 # run data manager in "single run" mode 
 # wait for all keepstores to finish working their trash and pull lists (i.e., /status.json reports @status["PullQueue"]["Queued"]==0 && status["PullQueue"]["InProgress"]==0 && ...@) 
 # get block index from all keepstores, make sure nothing has been deleted except the blocks that are back-dated _and_ unreferenced 
 # make API calls to delete some of the collections 
 # reduce replication on some of the collections 
 # run data manager again in "single run" mode 
 # wait for all keepstores to finish working their trash and pull lists 
 # get block index from all keepstores, make sure: 
 #* all blocks appearing in non-deleted collections were not deleted 
 #* all non-recent blocks appearing only in deleted collections were deleted 

 Along the way, the test suite must confirm that the test data includes 
 * some blocks that appear in at least one "non-deleted" and at least one "deleted" collection 
 * some blocks that appear in at least one "deleted" collection and no "non-deleted" collections, and are recent (i.e., written in the "transient" step, and therefore are not garbage) 
 * some blocks that never appeared in any collections, but are recent 
 * some blocks that never appeared in any collections, and are not recent 
 * some blocks that do appear in collections, and are not recent (i.e., being referenced by a collection is the only reason they're not garbage) 
 * of course, some blocks that actually get garbage-collected (i.e., the "garbage" set must not be empty!) 

 Minimal set of error cases to test: 
 * Network error when getting collection list from API server. Must stop without deleting anything. 
 * Configure with a token that is valid, but does not belong to an admin user (so "list collections" API will succeed, but not all collections will be returned). Must stop without deleting anything. 

 _Ideally,_ the assessment of whether a block has been "deleted" should compare desired to actual replication level -- this way, the test won't start failing when data manager starts deleting some copies of over-replicated non-garbage blocks. But if this part threatens to be non-trivial, we should defer: better to get the issue at hand done, and adjust when needed. 

 When this is done and we are satisfied it's effective, keepstore will no longer need to force @never_delete=true@. See #6221 

Back