Actions
Idea #18727
closedAvoid configuration skew between different services and hosts
Status:
Resolved
Priority:
Normal
Assigned To:
-
Target version:
-
Start date:
03/01/2022
Due date:
05/31/2022
Story points:
-
Release:
Release relationship:
Auto
Description
Background: With multiple back-end service components running on multiple hosts, it is possible to have services running with different configurations. In many cases, this happens by accident, and ends up causing problems for users/clients that are hard to diagnose.
Examples:- If RailsAPI is not restarted after changing /etc/arvados/config.yml, it will continue using the old config -- except that when passenger starts new worker threads, they use the new config.
- If the instance types are updated, and controller is restarted but arvados-dispatch-cloud is not restarted, clients will see that the updated types are available, but scheduling decisions will be made based on the old types.
- If a Keep volume changes from read-only to read-write, and controller/RailsAPI are restarted but the relevant keepstore processes are not restarted, clients will waste time trying to write to the volume (which keepstore will refuse to do) before falling back to different volumes/servers.
- Automatically detect when a version mismatch exists, and report this to the operator (via logs, health checks, metrics)
- Provide an easy mechanism for updating the configuration cluster-wide and signalling all services to restart/reload config as needed, thereby eliminating the most common causes of version mismatches (i.e., the operator fails to update config on all nodes or incorrectly identifies which services need to be restarted)
Updated by Tom Clegg almost 3 years ago
- Related to Idea #18256: Design bottom-up configuration/discovery strategy added
Updated by Tom Clegg almost 3 years ago
- Related to Idea #18685: Synchronize configuration on multi-node cluster added
Updated by Peter Amstutz almost 3 years ago
- Related to Feature #18768: Design for ability to check what config is in use across the cluster added
Updated by Peter Amstutz almost 3 years ago
- Start date set to 03/01/2022
- Due date set to 05/31/2022
Updated by Peter Amstutz almost 3 years ago
- Related to Bug #16345: Health check checks for clock and version skew added
Updated by Peter Amstutz over 2 years ago
- Related to Feature #18794: cluster health check fails if some services are using different configs added
Actions