Project

General

Profile

Actions

Idea #3640

closed

[SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.

Added by Tom Clegg over 9 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Keep
Target version:
-
Start date:
Due date:
Story points:
2.0

Description

Background:

arv-mount has a block cache, which improves performance when the same blocks are read multiple times. However:
  • Currently a new arv-mount process is started for each Crunch task execution. This means tasks don't share a cache, even if they're running at the same time.
  • In the common case where multiple crunch tasks run at the same time and use the same data, we have multiple arv-mount processes each retrieving and caching its own copy of the same data blocks.
Proposed improvement:
  • Use large swap on worker nodes (preferably SSD). (We already do this for other reasons.)
  • Set up a large tmpfs on worker nodes and use it as crunch job scratch space. (This already gets cleared at the beginning of a job to avoid leakage between jobs/users.)
  • Use a directory in that tmpfs as an arv-mount cache. This makes it feasible to use a large cache size, and makes it easy to share the cache between multiple arv-mount processes.
Implementation notes:
  • Rely on unix permissions for cache privacy. (Warn if the cache dir's mode & 0007 != 0, but go ahead anyway: there will be cases where that would be useful and not dangerous.)
  • Use flock() to avoid races and duplicated effort. (If arv-mount 1 is writing a block to the cache, then arv-mount 2 should wait for arv-mount 1 to finish then read from the cache, rather than fetch its own copy.)
  • Do not clean up cache dir at start/exit, at least by default (the general idea is to share with past/future arv-mount procs). An optional --cache-clear-atexit flag would be nice to have.
  • Measuring/limiting cache size could be interesting
  • Delete & replace upon finding a corrupt/truncated cache entry
Integration:
  • The default Keep mount on shell nodes should use a filesystem cache, assuming there is an appropriate filesystem for it (i.e., something faster than network: tmpfs, SSD, or at least a disk with async/barriers=0).
  • crunch-job should create a per-job temp dir on each node during the "install" phase, and point all arv-mount processes to it.

Related issues

Related to Arvados - Feature #6310: [FUSE] Support scaling the internal block cache based on number of open filesNewActions
Related to Arvados - Idea #6311: [Maybe] [SDKs] Support caching Keep blocks in memcachedRejectedActions
Related to Arvados - Feature #8228: [SDKs] [FUSE] Python SDK and arv-mount use Range requests when a caller requests part of a block that has been ejected from the cacheNewActions
Has duplicate Arvados - Idea #10510: Allow Keep client to cache blocks to diskDuplicateColin Nolan11/10/2016Actions
Has duplicate Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mountResolvedPeter Amstutz10/21/2022Actions
Actions

Also available in: Atom PDF