Project

General

Profile

Actions

Feature #18768

closed

Design for ability to check what config is in use across the cluster

Added by Peter Amstutz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

Create an implementation plan to automatically detect when a version mismatch of the config exists among the various active services, and report this to the operator (via logs, health checks, metrics).

  • during operation, service checks for config changes. if the config changes, the service loads the config file and validates it. It then adds a health check warning that the config file on disk does not match the config file in memory. If the config file failed validation (which means the service would fail if restarted), it should report that as well
  • prometheus metric reports 0 or 1 whether the config on disk matches the config in memory
  • health check reports md5sum and timestamp of the config file on disk
    • health check aggregator can check if the sums don't match
  • add a command line tool to arvados-client which uses the same logic as the health check aggregator to report the health check results of all the services
  • the public config published by controller should include a timestamp for config last modified time

This phase of implementation is for reporting/detecting config changes only, not responding to them.


Related issues

Related to Arvados Epics - Idea #18727: Avoid configuration skew between different services and hostsResolved03/01/202205/31/2022Actions
Related to Arvados - Feature #18794: cluster health check fails if some services are using different configsResolvedTom Clegg05/06/2022Actions
Actions #1

Updated by Peter Amstutz about 2 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 2 years ago

  • Status changed from In Progress to New
  • Subject changed from Ability to check what config is in use across the cluster to Design for ability to check what config is in use across the cluster
  • Tracker changed from Bug to Feature
Actions #3

Updated by Peter Amstutz about 2 years ago

  • Related to Idea #18727: Avoid configuration skew between different services and hosts added
Actions #4

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz about 2 years ago

  • Assigned To set to Tom Clegg
Actions #7

Updated by Tom Clegg about 2 years ago

  • Related to Feature #18794: cluster health check fails if some services are using different configs added
Actions #8

Updated by Tom Clegg about 2 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Peter Amstutz about 2 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF