Bug #8878
closedKeep: sudden appearance of "missing" blocks
Description
I had done a "garbage collection" before Easter as follows:
2016/03/24 17:06:10 Read and processed 417 collections 2016/03/24 17:06:13 Blocks In Collections: 514668, Blocks In Keep: 961866. 2016/03/24 17:06:13 Replication Block Counts: Missing From Keep: 0, Under Replicated: 0, Over Replicated: 1650, Replicated Just Right: 513018, Not In Any Collection: 447198. Replication Collection Counts: Missing From Keep: 0, Under Replicated: 0, Over Replicated: 11, Replicated Just Right: 406. 2016/03/24 17:06:13 Blocks Histogram: 2016/03/24 17:06:13 {Requested:0 Actual:1}: 444455 2016/03/24 17:06:13 {Requested:0 Actual:2}: 2743 2016/03/24 17:06:13 {Requested:1 Actual:1}: 513018 2016/03/24 17:06:13 {Requested:1 Actual:2}: 1647 2016/03/24 17:06:13 {Requested:1 Actual:3}: 3 2016/03/24 17:06:15 Sending trash list to http://keep9.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep3.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep6.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep5.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep0.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep4.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep7.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep8.gcam1.example.com:25107 2016/03/24 17:06:15 Sending trash list to http://keep1.gcam1.example.com:25107 2016/03/24 17:06:15 Sent trash list to http://keep1.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:15 Sent trash list to http://keep0.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep4.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep9.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep3.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep5.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep8.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep7.gcam1.example.com:25107: response was HTTP 200 OK 2016/03/24 17:06:16 Sent trash list to http://keep6.gcam1.example.com:25107: response was HTTP 200 OK
Then after uploading two 4GB collections over the past week, we have deleted the 2 4GB collections that they were meant to replace, and then I run the Data Manager again in dry-run mode and the outcome is:
2016/04/04 12:51:17 Read and processed 421 collections 2016/04/04 12:51:19 Blocks In Collections: 782548, Blocks In Keep: 716788. 2016/04/04 12:51:19 Replication Block Counts: Missing From Keep: 65760, Under Replicated: 0, Over Replicated: 41180, Replicated Just Right: 675608, Not In Any Collection: 0. Replication Collection Counts: Missing From Keep: 3, Under Replicated: 0, Over Replicated: 13, Replicated Just Right: 405. 2016/04/04 12:51:19 Blocks Histogram: 2016/04/04 12:51:19 {Requested:1 Actual:0}: 65760 2016/04/04 12:51:19 {Requested:1 Actual:1}: 675608 2016/04/04 12:51:19 {Requested:1 Actual:2}: 41177 2016/04/04 12:51:19 {Requested:1 Actual:3}: 3
It is disconcerting to see {Requested:1 Actual:0}: 65760
(around 4GiB) but also {Requested:1 Actual:2}: 41177
(around 2.5GiB).
The two collections that were uploaded to replace the two that were deleted should have been exactly identical byte for byte, as the re-uploads were from the same files using identically the same file list.
A question I have is whether there is a tool that can tell me which collections and files within them have missing hashes. I think that I can easily modify some of my scripts to that purpose, so I would like to know if there is a tool that I can use as a double check.
The other question is whether I can run with Data Manager further consistency checks, for example as to verifying the hashes of the data blocks.
Files