Project

General

Profile

Bug #18000

Updated by Ward Vandewege over 2 years ago

<pre> 
 ward@shell:~$ arv collection list --order 'file_size_total desc' --limit 3 |       jq -r '.items[] | [.portable_data_hash,.uuid] |@csv' |sed -e 's/"//g'|tr '\n' ' ' |xargs arvados-client deduplication-report 
 Collection _____-_____-_______________: pdh ________________________________+5003343; nominal size 7382073267640 (6.7 TiB); file count 2796 
 Collection _____-_____-_______________: pdh ________________________________+4961919; nominal size 6989909625775 (6.4 TiB); file count 5592 
 Collection _____-_____-_______________: pdh ________________________________+2103205; nominal size 2795436541525 (2.5 TiB); file count 3028 

 Collections:                                 3 
 Nominal size of stored data:    17167419434940 bytes (16 TiB) 
 Actual size of stored data:     17170607344506 bytes (16 TiB) 
 Saved by Keep deduplication:       -3187909566 bytes (16 EiB) 
 </pre> 

 The actual size is calculated from the sum of the size of the blocks used between all collections. I assume the bug is caused by the fact that this calculation does not take into account that blocks can be used only partially in a manifest.

Back