Story #6311

[Maybe] [SDKs] Support caching Keep blocks in memcached

Added by Brett Smith about 4 years ago. Updated 4 months ago.

Status:
Rejected
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

We could potentially improve job performance by running memcached on each compute node to store Keep blocks. When a node is running many tasks from a job that access the same data, this cache could make it possible for the block to be downloaded to the node once, then shared across tasks.

If we decide to go ahead with this caching strategy, add the necessary support to the Python SDK Keepclient to use a memcached store when available.


Related issues

Related to Arvados - Story #3640: [SDKs] Add runtime option to SDKs (esp Python and arv-mount) to use a filesystem directory block cache as an alternative to RAM cache.New

History

#1 Updated by Brett Smith about 4 years ago

  • Description updated (diff)
  • Category set to SDKs

#2 Updated by Tom Clegg almost 4 years ago

Couple of things could make this much more troublesome than #3640:
  • Orchestrating turning up/down memcached when jobs start and stop.
  • Firewalling user/job A's memcached from user/job B's memcached. E.g., crunch2 allowing >1 job per node, shared shell VM.

Memcached is good for sharing free memory (freely!) across nodes. Given that each job has distinct permissions, we'd essentially need a VPN per job in order to take advantage of that feature. And without that feature, I'm not sure memcached would perform any better than a tmpfs-backed filesystem cache.

#3 Updated by Tom Clegg almost 2 years ago

  • Status changed from New to Rejected

#4 Updated by Tom Morris 4 months ago

  • Target version deleted (Arvados Future Sprints)

Also available in: Atom PDF