Bug #18547

[keep-balance] Avoid redundant indexing when multiple keepstore servers use a single NFS mount

Added by Tom Clegg about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
12/06/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Background: Currently keep-balance detects when a storage device like an S3 bucket is used by multiple keepstore servers, and arbitrarily chooses one of them to get the index. However, this relies on the "device ID" returned by keepstore, which is
  • s3://endpoint/bucketname, if the volume is an S3 bucket
  • block device UUID, if the volume is a local filesystem
  • empty, if the volume is a network-mounted filesystem

When the device ID is empty, the volumes might be different, so keep-balance indexes all of them.

Now that each keepstore server uses the same configuration file, each configured volume has a unique UUID, and the volume UUID is returned in the list of mounts reported by keepstore (none of which were true when the "device ID" approach started), keep-balance should detect identical/duplicate mounts by comparing volume UUIDs instead of device IDs.

(Note this will confuse keep-balance if the config uses a single volume UUID to mount a local disk like "/data" on multiple keepstore machines. But the install docs explicitly describe not doing that, and it is not a kind of configuration we want to support. Worst outcome is that someone with this kind of wonky config would see a lot of blocks misreported as underreplicated or missing by keep-balance until they fix their config.)


Subtasks

Task #18561: Review 18547-use-volume-uuid-not-device-idResolvedPeter Amstutz


Related issues

Related to Arvados - Bug #18376: [keepstore] Avoid long-lived readdirent cookies in filesystem driverResolved11/16/2021

Blocks Arvados - Story #18518: Release Arvados 2.3.2Resolved12/06/2021

History

#1 Updated by Tom Clegg about 2 months ago

  • Related to Bug #18376: [keepstore] Avoid long-lived readdirent cookies in filesystem driver added

#2 Updated by Tom Clegg about 2 months ago

  • Status changed from New to In Progress

#4 Updated by Tom Clegg about 2 months ago

Wondering whether we want a more limited version of this for 2.3.2 ("only use UUID if deviceID is empty") just in case the full version affects non-NFS setups in an unexpected way...

#5 Updated by Tom Clegg about 2 months ago

TODO: keep-balance should error out if two volumes return the same non-empty DeviceID.

#6 Updated by Peter Amstutz about 2 months ago

  • Release set to 48

#7 Updated by Peter Amstutz about 2 months ago

#8 Updated by Tom Clegg about 2 months ago

18547-use-volume-uuid-not-device-id @ 24f140f9ed1a2180541c0c7cebf7572c5155fe27 -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2829/
  • error out if two volumes return the same non-empty DeviceID

#9 Updated by Peter Amstutz about 2 months ago

Tom Clegg wrote:

18547-use-volume-uuid-not-device-id @ 24f140f9ed1a2180541c0c7cebf7572c5155fe27 -- https://ci.arvados.org/view/Developer/job/developer-run-tests/2829/
  • error out if two volumes return the same non-empty DeviceID

This LGTM. Could you please merge into both main and 2.3-dev?

#10 Updated by Tom Clegg about 2 months ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|920307882b3fe52a08b366a1c81e62f44ee639b9.

#11 Updated by Tom Clegg about 2 months ago

Merged, and cherry-picked e16866d0f and 24f140f9e onto 2.3-dev as 11864d817 and 56c37ef9b respectively.

Also available in: Atom PDF