Project

General

Profile

Feature #21887

Updated by Peter Amstutz 9 months ago

Want to have estimates of storage usage. 

 Proposal: 

 Keep-balance keeps track of storage usage per-project, and updates the project records at the conclusion of each run, similarly to how it currently maintains the replication status fields for collections. 

 For each project, it should record: 

 * Perceived usage - approximately what you would get from sum(file_size_total) of collections in the project 
 * Project deduplicated usage - each unique block counted exactly once 
 * Whole cluster deduplicated usage - each unique block size multiplied by the ratio of (appearances of that block in project / appearances in whole cluster) 

 Related: 

 The total bytes counts up manifest _streams_ and not _files_ so if you have a collection with a single 10 KiB file that's embedded in a 64 MiB block, it gets counted as 64 MiB and not 10 KiB.    This seems to result in somewhat exaggerated deduplication ratios.    There should be a metric that counts just the 10 KiB referenced. 

 

Back