Project

General

Profile

Actions

Feature #18768

closed

Design for ability to check what config is in use across the cluster

Added by Peter Amstutz 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Create an implementation plan to automatically detect when a version mismatch of the config exists among the various active services, and report this to the operator (via logs, health checks, metrics).

  • during operation, service checks for config changes. if the config changes, the service loads the config file and validates it. It then adds a health check warning that the config file on disk does not match the config file in memory. If the config file failed validation (which means the service would fail if restarted), it should report that as well
  • prometheus metric reports 0 or 1 whether the config on disk matches the config in memory
  • health check reports md5sum and timestamp of the config file on disk
    • health check aggregator can check if the sums don't match
  • add a command line tool to arvados-client which uses the same logic as the health check aggregator to report the health check results of all the services
  • the public config published by controller should include a timestamp for config last modified time

This phase of implementation is for reporting/detecting config changes only, not responding to them.


Related issues

Related to Arvados Epics - Story #18727: Avoid configuration skew between different services and hostsResolved03/01/202205/31/2022

Actions
Related to Arvados - Feature #18794: cluster health check fails if some services are using different configsResolvedTom Clegg05/06/2022

Actions
Actions #1

Updated by Peter Amstutz 12 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 12 months ago

  • Status changed from In Progress to New
  • Subject changed from Ability to check what config is in use across the cluster to Design for ability to check what config is in use across the cluster
  • Tracker changed from Bug to Feature
Actions #3

Updated by Peter Amstutz 12 months ago

  • Related to Story #18727: Avoid configuration skew between different services and hosts added
Actions #4

Updated by Peter Amstutz 12 months ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz 12 months ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz 12 months ago

  • Assigned To set to Tom Clegg
Actions #7

Updated by Tom Clegg 12 months ago

  • Related to Feature #18794: cluster health check fails if some services are using different configs added
Actions #8

Updated by Tom Clegg 11 months ago

  • Status changed from New to In Progress
Actions #9

Updated by Peter Amstutz 11 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF