Bug #19277
closedArvados client inside container should use local keepstore
Description
Currently, when crunch-run starts a local keepstore process, it gets used by arv-mount, but is not advertised to, or usable from, processes inside the container.
The result is that, when many containers are running, a container that uses the Arvados API natively (as lightning does) oversaturates the fixed keepstore gateway nodes while the closer/faster keepstore processes sit idle. (Example: #19236#note-9)
Implementation- local keepstore listens on non-loopback interface(s) (currently listens only on localhost, which is inaccessible from the container)
- crunch-run passes suitable ARVADOS_KEEP_SERVICES env var into the container
Updated by Tom Clegg over 2 years ago
Choosing the best listening address is a bit awkward.
In the docker case, the docker0 interface seems ideal: it's known before the container starts, the container can connect to it, but it's not routable from outside the worker host.
In the singularity case, the host side of the host/container private network link doesn't exist until we ask singularity to start a container, which means we've already told singularity what we want the ARVADOS_KEEP_SERVICES env var to be.
I think the following approach should work consistently for both cases:- get local IP addresses from
/proc/$$/net/fib_trie
(we have a function for this in lib/crunchrun/singularity.go) - sort numerically (10.2.2.2 < 10.10.10.10)
- choose the first address that is not loopback (127/8), VPN (100.64/10), or link-local (169.254/16)
- use this (instead of localhost) as the listening address for keepstore, and the KEEP_SERVICES url passed to arv-mount and into the container
Updated by Tom Clegg over 2 years ago
19277-local-keep-from-ctr @ 748ee07068ed64fa2e12901ce43f548bd4ff213a -- developer-run-tests: #3237
Installed this on 2xpu4 to rescue a workflow that was overloading keepstores and dying, and this seemed to do the trick.
Updated by Tom Clegg over 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|420e857f8e8ac75beca258fa72b9edac680500cd.
Updated by Peter Amstutz over 2 years ago
- Release deleted (
53)
I tried to cherry-pick this on to 2.4-staging and it didn't apply cleanly, so I'm rejecting it for 2.4.3
Updated by Tom Clegg over 2 years ago
- Release set to 53
cherry-picked to 2.4-release as 6e992b73bf60a23b2ca10ca9694e5dff4d1497cc