Project

General

Profile

Actions

Feature #20995

closed

Prefetch small files when scanning a collection directory

Added by Peter Amstutz over 1 year ago. Updated 11 months ago.

Status:
Duplicate
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-

Description

When reading multiple small files in a collection through keep-web, anticipate that the client may read other files in the collection directory adjacent in the data stream to the file being accessed, and perform parallel prefetch on those files as well.

My thinking is that if a given stream is less than X megabytes (256? 512? 1024? configurable?) then keep-web would start a parallel prefetch of the entire stream. In particular if there are lots of small blocks we want to have a bunch of prefetch requests in flight.

One thought is to use similar logic to large file block prefetch where we start from the first read and just read ahead in the stream, except we ignore file boundaries. If prefetch would take us past the end of the end of the stream, we wrap around and start reading at the beginning.

The controlling assumptions here are (a) we have a lot of fast cache where we can dump our blocks (b) the stream ordering is at least vaguely similar to the order of data access


Related issues 2 (1 open1 closed)

Related to Arvados Epics - Idea #18342: Keep performance optimizationNew08/01/202312/31/2024Actions
Related to Arvados - Feature #18961: Go FileSystem / FUSE mount supports block prefetchClosedTom CleggActions
Actions

Also available in: Atom PDF