Bug #19277


Arvados client inside container should use local keepstore

Added by Tom Clegg almost 2 years ago. Updated over 1 year ago.

Assigned To:
Target version:
Story points:
Release relationship:


Currently, when crunch-run starts a local keepstore process, it gets used by arv-mount, but is not advertised to, or usable from, processes inside the container.

The result is that, when many containers are running, a container that uses the Arvados API natively (as lightning does) oversaturates the fixed keepstore gateway nodes while the closer/faster keepstore processes sit idle. (Example: #19236#note-9)

  • local keepstore listens on non-loopback interface(s) (currently listens only on localhost, which is inaccessible from the container)
  • crunch-run passes suitable ARVADOS_KEEP_SERVICES env var into the container

Subtasks 1 (0 open1 closed)

Task #19279: Review 19277-local-keep-from-ctrResolvedLucas Di Pentima07/19/2022Actions
Actions #1

Updated by Tom Clegg almost 2 years ago

Choosing the best listening address is a bit awkward.

In the docker case, the docker0 interface seems ideal: it's known before the container starts, the container can connect to it, but it's not routable from outside the worker host.

In the singularity case, the host side of the host/container private network link doesn't exist until we ask singularity to start a container, which means we've already told singularity what we want the ARVADOS_KEEP_SERVICES env var to be.

I think the following approach should work consistently for both cases:
  • get local IP addresses from /proc/$$/net/fib_trie (we have a function for this in lib/crunchrun/singularity.go)
  • sort numerically ( <
  • choose the first address that is not loopback (127/8), VPN (100.64/10), or link-local (169.254/16)
  • use this (instead of localhost) as the listening address for keepstore, and the KEEP_SERVICES url passed to arv-mount and into the container
Actions #2

Updated by Tom Clegg almost 2 years ago

19277-local-keep-from-ctr @ 748ee07068ed64fa2e12901ce43f548bd4ff213a -- developer-run-tests: #3237

Installed this on 2xpu4 to rescue a workflow that was overloading keepstores and dying, and this seemed to do the trick.

Actions #3

Updated by Lucas Di Pentima almost 2 years ago

Looks good, please merge. Thanks!

Actions #4

Updated by Tom Clegg almost 2 years ago

  • Status changed from In Progress to Resolved
Actions #5

Updated by Tom Clegg over 1 year ago

  • Release set to 53
Actions #6

Updated by Peter Amstutz over 1 year ago

  • Release deleted (53)

I tried to cherry-pick this on to 2.4-staging and it didn't apply cleanly, so I'm rejecting it for 2.4.3

Actions #7

Updated by Tom Clegg over 1 year ago

  • Release set to 53

cherry-picked to 2.4-release as 6e992b73bf60a23b2ca10ca9694e5dff4d1497cc


Also available in: Atom PDF