Project

General

Profile

Actions

Bug #19277

closed

Arvados client inside container should use local keepstore

Added by Tom Clegg 3 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
07/19/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Currently, when crunch-run starts a local keepstore process, it gets used by arv-mount, but is not advertised to, or usable from, processes inside the container.

The result is that, when many containers are running, a container that uses the Arvados API natively (as lightning does) oversaturates the fixed keepstore gateway nodes while the closer/faster keepstore processes sit idle. (Example: #19236#note-9)

Implementation
  • local keepstore listens on non-loopback interface(s) (currently listens only on localhost, which is inaccessible from the container)
  • crunch-run passes suitable ARVADOS_KEEP_SERVICES env var into the container

Subtasks 1 (0 open1 closed)

Task #19279: Review 19277-local-keep-from-ctrResolvedLucas Di Pentima07/19/2022

Actions
Actions #1

Updated by Tom Clegg 3 months ago

Choosing the best listening address is a bit awkward.

In the docker case, the docker0 interface seems ideal: it's known before the container starts, the container can connect to it, but it's not routable from outside the worker host.

In the singularity case, the host side of the host/container private network link doesn't exist until we ask singularity to start a container, which means we've already told singularity what we want the ARVADOS_KEEP_SERVICES env var to be.

I think the following approach should work consistently for both cases:
  • get local IP addresses from /proc/$$/net/fib_trie (we have a function for this in lib/crunchrun/singularity.go)
  • sort numerically (10.2.2.2 < 10.10.10.10)
  • choose the first address that is not loopback (127/8), VPN (100.64/10), or link-local (169.254/16)
  • use this (instead of localhost) as the listening address for keepstore, and the KEEP_SERVICES url passed to arv-mount and into the container
Actions #2

Updated by Tom Clegg 3 months ago

19277-local-keep-from-ctr @ 748ee07068ed64fa2e12901ce43f548bd4ff213a -- developer-run-tests: #3237

Installed this on 2xpu4 to rescue a workflow that was overloading keepstores and dying, and this seemed to do the trick.

Actions #3

Updated by Lucas Di Pentima 2 months ago

Looks good, please merge. Thanks!

Actions #4

Updated by Tom Clegg 2 months ago

  • Status changed from In Progress to Resolved
Actions #5

Updated by Tom Clegg about 1 month ago

  • Release set to 53
Actions #6

Updated by Peter Amstutz 13 days ago

  • Release deleted (53)

I tried to cherry-pick this on to 2.4-staging and it didn't apply cleanly, so I'm rejecting it for 2.4.3

Actions #7

Updated by Tom Clegg 13 days ago

  • Release set to 53

cherry-picked to 2.4-release as 6e992b73bf60a23b2ca10ca9694e5dff4d1497cc

Actions

Also available in: Atom PDF