Project

General

Profile

Feature #15125

Updated by Tom Clegg about 5 years ago

A site admin, upon suspecting keep-balance is erroneously trashing some data, should be able to 
 * act quickly to minimize the impact, and 
 * characterize the damage, if any 

 Steps to minimize the impact: 
 * immediately prevent keepstore from trashing or deleting any blocks while investigation/recovery proceeds 
 * untrash any blocks that might have been trashed erroneously (this may enable affected workflows to resume) 

 Steps to characterize the damage: 
 * get a list of missing block IDs 
 * get a list of collections that reference missing blocks (including uuid, pdh, name, project uuid, project name) 

 Troubleshooting: 
 * report version in metrics (e.g., @version{program="keep-balance", version="1.3.1"} = 1@) 
 * report #+size of trashed blocks in metrics 
 * keepstore "untrash all" management API 
 * get keep-balance reporting option to get debug info for a list of specific collection IDs and block IDs (without getting the entire debug dump, which is huge) 
 * @keep-block-check --collection=uuid_or_pdh@ 

Back