Actions
Idea #21942
openPoor performance when a collection consists mostly of small slices of many different large blocks
Start date:
Due date:
Story points:
-
Description
User has a collection which consists of several thousand files that are 100-200 bytes each.
Each file was sourced from a different workflow output collection.
When these files were created, small file packing was applied, as a result these 100-200 byte files are embedded in a data block that is 50-60 MiB.
As a result, going through this collection and reading each file is much slower than expected, because behind the scenes, Arvados must fetch a 50-60 MiB block to extract the 100-200 byte slice the making up the file.
Think about ways to behave more efficiently where < 50% of a given block is referenced by a collection.
A couple ideas:
- Support range requests (interacts poorly with caching, though)
- When constructing/saving a collection that would be like this, do an "optimize" pass to rewrite/repack files
Updated by Peter Amstutz 3 months ago
- Subject changed from Poor performance when a collection consists mostly of small slices of many large blocks to Poor performance when a collection consists mostly of small slices of many different large blocks
Actions