Feature #20953
closedMonitoring for TLS certificate expiration
Description
Use something like x509-certificate-exporter to include certificate stats in monitoring/alerting:
https://github.com/enix/x509-certificate-exporter/blob/main/docs/grafana-dashboard.jpg
Files
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Updated by Peter Amstutz about 1 year ago
- Story points set to 2.0
- Tracker changed from Bug to Feature
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Updated by Peter Amstutz about 1 year ago
- Subject changed from Monitoring / alerting for TLS certificate expiration to Monitoring for TLS certificate expiration
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-10-25 sprint to Development 2023-11-08 sprint
Updated by Lucas Di Pentima about 1 year ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima about 1 year ago
- File main-dashboard.png main-dashboard.png added
- File certificate-monitor.png certificate-monitor.png added
Updates at 6280dbe - branch 20953-installer-tls-cert-monitoring
- Installs & configures
blackbox_exporter
so that it probes various HTTPS endpoints. - Adds a new "Certificate & Connection Monitoring" dashboard that show how much time until cert expire and also different probing times for each endpoint.
- Adds a new element on the main "Arvados cluster overview" dashboard to show the earliest cert expiration time.
- Use different coloring thresholds (red/yellow/green) on the above dashboard elements depending on the use of Let's Encrypt certificates.
Examples¶
Updated by Tom Clegg about 1 year ago
This LGTM, thanks.
Two nitpicks:
Let's Encrypt issues 90-day certs, and recommends renewing every 60 days. With that schedule I'd suggest different thresholds: yellow for <29 days (one less than 30 so the timing of a daily cron job doesn't cause a yellow) and red for <22 days (arbitrarily picking 1 week after yellow).
For non-LE, my first thought was that turning yellow at 6 months seems pretty aggressive -- but then, non-LE renewal is more likely to be manual and/or very slow, so perhaps that's a good time to start looking into it.
Could the green/OK indicator on the Arvados Cluster Overview dashboard be more muted -- like a much darker green? It seems a bit odd for it to be so loud & prominent when it doesn't need attention. (If this were an overall status indicator that only turned green when literally every measurable thing looked good, then a bright green might seem like a more reasonable choice... but this is just one of many things that should be green, so it probably shouldn't look so decisive.)
Updated by Lucas Di Pentima about 1 year ago
Tom Clegg wrote in #note-10:
Let's Encrypt issues 90-day certs, and recommends renewing every 60 days. With that schedule I'd suggest different thresholds: yellow for <29 days (one less than 30 so the timing of a daily cron job doesn't cause a yellow) and red for <22 days (arbitrarily picking 1 week after yellow).
Sounds good.
Could the green/OK indicator on the Arvados Cluster Overview dashboard be more muted -- like a much darker green? It seems a bit odd for it to be so loud & prominent when it doesn't need attention. (If this were an overall status indicator that only turned green when literally every measurable thing looked good, then a bright green might seem like a more reasonable choice... but this is just one of many things that should be green, so it probably shouldn't look so decisive.)
This is a good idea! Instead of using a dull green color, I'll set the background as transparent.
Updated by Lucas Di Pentima about 1 year ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|2195844ba309db0ec552aa8b14a7f02cf74e9b7b.