Project

General

Profile

Actions

Idea #3640

closed

[SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.

Added by Tom Clegg over 9 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
Keep
Target version:
-
Start date:
Due date:
Story points:
2.0

Description

Background:

arv-mount has a block cache, which improves performance when the same blocks are read multiple times. However:
  • Currently a new arv-mount process is started for each Crunch task execution. This means tasks don't share a cache, even if they're running at the same time.
  • In the common case where multiple crunch tasks run at the same time and use the same data, we have multiple arv-mount processes each retrieving and caching its own copy of the same data blocks.
Proposed improvement:
  • Use large swap on worker nodes (preferably SSD). (We already do this for other reasons.)
  • Set up a large tmpfs on worker nodes and use it as crunch job scratch space. (This already gets cleared at the beginning of a job to avoid leakage between jobs/users.)
  • Use a directory in that tmpfs as an arv-mount cache. This makes it feasible to use a large cache size, and makes it easy to share the cache between multiple arv-mount processes.
Implementation notes:
  • Rely on unix permissions for cache privacy. (Warn if the cache dir's mode & 0007 != 0, but go ahead anyway: there will be cases where that would be useful and not dangerous.)
  • Use flock() to avoid races and duplicated effort. (If arv-mount 1 is writing a block to the cache, then arv-mount 2 should wait for arv-mount 1 to finish then read from the cache, rather than fetch its own copy.)
  • Do not clean up cache dir at start/exit, at least by default (the general idea is to share with past/future arv-mount procs). An optional --cache-clear-atexit flag would be nice to have.
  • Measuring/limiting cache size could be interesting
  • Delete & replace upon finding a corrupt/truncated cache entry
Integration:
  • The default Keep mount on shell nodes should use a filesystem cache, assuming there is an appropriate filesystem for it (i.e., something faster than network: tmpfs, SSD, or at least a disk with async/barriers=0).
  • crunch-job should create a per-job temp dir on each node during the "install" phase, and point all arv-mount processes to it.

Related issues

Related to Arvados - Feature #6310: [FUSE] Support scaling the internal block cache based on number of open filesNewActions
Related to Arvados - Idea #6311: [Maybe] [SDKs] Support caching Keep blocks in memcachedRejectedActions
Related to Arvados - Feature #8228: [SDKs] [FUSE] Python SDK and arv-mount use Range requests when a caller requests part of a block that has been ejected from the cacheNewActions
Has duplicate Arvados - Idea #10510: Allow Keep client to cache blocks to diskDuplicateColin Nolan11/10/2016Actions
Has duplicate Arvados - Feature #18842: Local disk keep cache for Python SDK/arv-mountResolvedPeter Amstutz10/21/2022Actions
Actions #1

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
  • Category set to Keep
Actions #2

Updated by Tom Clegg over 9 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Tom Clegg over 9 years ago

  • Subject changed from [FUSE] Add runtime option to use a filesystem directory block cache as an alternative to RAM cache. to [FUSE] Add runtime option to arv-mount to use a filesystem directory block cache as an alternative to RAM cache.
Actions #4

Updated by Tom Clegg over 9 years ago

  • Subject changed from [FUSE] Add runtime option to arv-mount to use a filesystem directory block cache as an alternative to RAM cache. to [FUSE] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.
Actions #5

Updated by Tom Clegg over 9 years ago

  • Subject changed from [FUSE] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache. to [SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.
Actions #6

Updated by Tom Clegg almost 9 years ago

  • Description updated (diff)
Actions #7

Updated by Ward Vandewege almost 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #8

Updated by Peter Amstutz about 2 years ago

  • Has duplicate Feature #18842: Local disk keep cache for Python SDK/arv-mount added
Actions #10

Updated by Peter Amstutz almost 2 years ago

Implementation brainstorm.

Build this feature around mmap()

When fetching a block, first check the memory cache, then check the disk cache, then fetch it from keep

  1. When fetching a block from keep, keep it in memory and start asynchronously writing it out to disk
    • We want to be able to serve reads immediately without waiting for the disk cache machinery
  2. Once it has been written to disk it can be ejected from the memory cache
  3. When we find a block in the disk cache, open it and use mmap(), this gives us something that behaves like a memory buffer
  4. Separately keep track of open file descriptors and close ones that haven't been used recently
  5. Separately keep track of space used by blocks on disk and delete least recently used ones
Benefits of the mmap approach:
  • existing code for reassembling files from blocks mostly doesn't have to change
  • avoid making a read() syscall in the happy case (no page fault)
  • able to leverage the kernel's filesystem cache to balance between user process memory & cache memory
  • the file that's just been written and then re-opened might even still be in the file system cache, which may even avoid blocking on disk activity
  • can have a much larger default cache, users don't have to think about the Arvados cache
Actions #11

Updated by Brett Smith about 1 year ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF