Feature #15125
Updated by Tom Clegg over 5 years ago
A site admin, upon suspecting keep-balance is erroneously trashing some data, should be able to * act quickly to minimize the impact, and * characterize the damage, if any Steps to minimize the impact: * immediately prevent keepstore from trashing or deleting any blocks while investigation/recovery proceeds * untrash any blocks that might have been trashed erroneously (this may enable affected workflows to resume) Steps to characterize the damage: * get a list of missing block IDs * get a list of collections that reference missing blocks (including uuid, pdh, name, project uuid, project name) Troubleshooting: * report version in metrics (e.g., @version{program="keep-balance", version="1.3.1"} = 1@) * report #+size of trashed blocks in metrics * keepstore "untrash all" management API * get keep-balance reporting option to get debug info for a list of specific collection IDs and block IDs (without getting the entire debug dump, which is huge) * @keep-block-check --collection=uuid_or_pdh@