Project

General

Profile

Actions

Feature #21887

open

Keep-balance records storage usage per-project

Added by Peter Amstutz 5 months ago. Updated about 21 hours ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Keep
Target version:
Story points:
-

Description

Want to have estimates of storage usage.

Proposal:

Keep-balance keeps track of storage usage per-project, and updates the project records at the conclusion of each run, similarly to how it currently maintains the replication status fields for collections.

For each project, it should record:

  • Perceived usage - approximately what you would get from sum(file_size_total) of collections in the project
  • Project deduplicated usage - each unique block counted exactly once
  • Whole cluster deduplicated usage - each unique block size multiplied by the ratio of (appearances of that block in project / appearances in whole cluster)

Related:

The total bytes counts up manifest streams and not files so if you have a collection with a single 10 KiB file that's embedded in a 64 MiB block, it gets counted as 64 MiB and not 10 KiB. This seems to result in somewhat exaggerated deduplication ratios. There should be a metric that counts just the 10 KiB referenced.

Actions #1

Updated by Peter Amstutz 5 months ago

  • Subject changed from Report storage usage per-project to Keep-balance records storage usage per-project
Actions #2

Updated by Peter Amstutz 5 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 5 months ago

  • Target version changed from Future to Development 2024-08-28 sprint
Actions #6

Updated by Peter Amstutz 4 months ago

  • Description updated (diff)
Actions #7

Updated by Peter Amstutz 4 months ago

From Matrix discussion:

Peter:

Basically, the deduplication ratio on some clusters seems a little exaggerated.
My suspicion is that it may be over-counting due to the block packing/slicing behavior
basically I'd like a number which corresponds to sum(file_size_total) -- the usage as perceived by the user. Also if you were to export the entire thing to some other storage system, that's what you'd actually use. (I'm assuming file_size_total means what I think it means but I now I have to go check).

Tom:
I see, that makes sense. So, currently the ratio only tells you block de-duplication, but to compare a regular filesystem you would want to account for both block de-duplication and block-packing wastage.
I think file_size_total should work.

Actions #8

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-08-28 sprint to Development 2024-09-11 sprint
Actions #9

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-09-11 sprint to Development 2024-09-25 sprint
Actions #10

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-09-25 sprint to Development 2024-10-09 sprint
Actions #11

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2024-10-09 sprint to Development 2024-11-06 sprint
Actions #12

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-11-06 sprint to Development 2024-11-20
Actions #13

Updated by Peter Amstutz 30 days ago

  • Target version changed from Development 2024-11-20 to Development 2024-12-18
Actions #14

Updated by Peter Amstutz about 21 hours ago

  • Target version changed from Development 2024-12-18 to Future
Actions

Also available in: Atom PDF