Bug #18000

[deduplicationreport] negative number in the "saved by Keep deduplication" report

Added by Ward Vandewege 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

ward@shell:~$ arv collection list --order 'file_size_total desc' --limit 3 |     jq -r '.items[] | [.portable_data_hash,.uuid] |@csv' |sed -e 's/"//g'|tr '\n' ' ' |xargs arvados-client deduplication-report
Collection _____-_____-_______________: pdh ________________________________+5003343; nominal size 7382073267640 (6.7 TiB); file count 2796
Collection _____-_____-_______________: pdh ________________________________+4961919; nominal size 6989909625775 (6.4 TiB); file count 5592
Collection _____-_____-_______________: pdh ________________________________+2103205; nominal size 2795436541525 (2.5 TiB); file count 3028

Collections:                               3
Nominal size of stored data:  17167419434940 bytes (16 TiB)
Actual size of stored data:   17170607344506 bytes (16 TiB)
Saved by Keep deduplication:     -3187909566 bytes (16 EiB)

The actual size is calculated from the sum of the size of the blocks used between all collections. I assume the bug is caused by the fact that this calculation does not take into account that blocks can be used only partially in a manifest.

History

#1 Updated by Ward Vandewege 2 months ago

  • Description updated (diff)

Also available in: Atom PDF