Project

General

Profile

Actions

Idea #6311

closed

[Maybe] [SDKs] Support caching Keep blocks in memcached

Added by Brett Smith almost 9 years ago. Updated about 5 years ago.

Status:
Rejected
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
-
Start date:
Due date:
Story points:
-

Description

We could potentially improve job performance by running memcached on each compute node to store Keep blocks. When a node is running many tasks from a job that access the same data, this cache could make it possible for the block to be downloaded to the node once, then shared across tasks.

If we decide to go ahead with this caching strategy, add the necessary support to the Python SDK Keepclient to use a memcached store when available.


Related issues

Related to Arvados - Idea #3640: [SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.ClosedActions
Actions #1

Updated by Brett Smith almost 9 years ago

  • Description updated (diff)
  • Category set to SDKs
Actions #2

Updated by Tom Clegg almost 9 years ago

Couple of things could make this much more troublesome than #3640:
  • Orchestrating turning up/down memcached when jobs start and stop.
  • Firewalling user/job A's memcached from user/job B's memcached. E.g., crunch2 allowing >1 job per node, shared shell VM.

Memcached is good for sharing free memory (freely!) across nodes. Given that each job has distinct permissions, we'd essentially need a VPN per job in order to take advantage of that feature. And without that feature, I'm not sure memcached would perform any better than a tmpfs-backed filesystem cache.

Actions #3

Updated by Tom Clegg almost 7 years ago

  • Status changed from New to Rejected
Actions #4

Updated by Tom Morris about 5 years ago

  • Target version deleted (Arvados Future Sprints)
Actions

Also available in: Atom PDF