Project

General

Profile

Actions

Feature #20953

closed

Monitoring for TLS certificate expiration

Added by Peter Amstutz 8 months ago. Updated 9 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Story points:
2.0
Release:
Release relationship:
Auto

Description

Use something like x509-certificate-exporter to include certificate stats in monitoring/alerting:

https://github.com/enix/x509-certificate-exporter/blob/main/docs/grafana-dashboard.jpg


Files

main-dashboard.png (107 KB) main-dashboard.png Lucas Di Pentima, 10/25/2023 08:28 PM
certificate-monitor.png (255 KB) certificate-monitor.png Lucas Di Pentima, 10/25/2023 08:28 PM
transparent-bg.png (73.4 KB) transparent-bg.png Lucas Di Pentima, 10/26/2023 01:34 PM

Subtasks 1 (0 open1 closed)

Task #20962: Review 20953-installer-tls-cert-monitoringResolvedLucas Di Pentima10/25/2023Actions
Actions #1

Updated by Peter Amstutz 8 months ago

  • Assigned To set to Lucas Di Pentima
Actions #2

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2023-09-27 sprint to Development 2023-10-11 sprint
Actions #3

Updated by Peter Amstutz 8 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 8 months ago

  • Story points set to 2.0
  • Tracker changed from Bug to Feature
Actions #5

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Actions #6

Updated by Peter Amstutz 7 months ago

  • Subject changed from Monitoring / alerting for TLS certificate expiration to Monitoring for TLS certificate expiration
Actions #7

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-10-25 sprint to Development 2023-11-08 sprint
Actions #8

Updated by Lucas Di Pentima 7 months ago

  • Status changed from New to In Progress
Actions #9

Updated by Lucas Di Pentima 7 months ago

Updates at 6280dbe - branch 20953-installer-tls-cert-monitoring

  • Installs & configures blackbox_exporter so that it probes various HTTPS endpoints.
  • Adds a new "Certificate & Connection Monitoring" dashboard that show how much time until cert expire and also different probing times for each endpoint.
  • Adds a new element on the main "Arvados cluster overview" dashboard to show the earliest cert expiration time.
  • Use different coloring thresholds (red/yellow/green) on the above dashboard elements depending on the use of Let's Encrypt certificates.

Examples

Actions #10

Updated by Tom Clegg 7 months ago

This LGTM, thanks.

Two nitpicks:

Let's Encrypt issues 90-day certs, and recommends renewing every 60 days. With that schedule I'd suggest different thresholds: yellow for <29 days (one less than 30 so the timing of a daily cron job doesn't cause a yellow) and red for <22 days (arbitrarily picking 1 week after yellow).

For non-LE, my first thought was that turning yellow at 6 months seems pretty aggressive -- but then, non-LE renewal is more likely to be manual and/or very slow, so perhaps that's a good time to start looking into it.

Could the green/OK indicator on the Arvados Cluster Overview dashboard be more muted -- like a much darker green? It seems a bit odd for it to be so loud & prominent when it doesn't need attention. (If this were an overall status indicator that only turned green when literally every measurable thing looked good, then a bright green might seem like a more reasonable choice... but this is just one of many things that should be green, so it probably shouldn't look so decisive.)

Actions #11

Updated by Lucas Di Pentima 7 months ago

Tom Clegg wrote in #note-10:

Let's Encrypt issues 90-day certs, and recommends renewing every 60 days. With that schedule I'd suggest different thresholds: yellow for <29 days (one less than 30 so the timing of a daily cron job doesn't cause a yellow) and red for <22 days (arbitrarily picking 1 week after yellow).

Sounds good.

Could the green/OK indicator on the Arvados Cluster Overview dashboard be more muted -- like a much darker green? It seems a bit odd for it to be so loud & prominent when it doesn't need attention. (If this were an overall status indicator that only turned green when literally every measurable thing looked good, then a bright green might seem like a more reasonable choice... but this is just one of many things that should be green, so it probably shouldn't look so decisive.)

This is a good idea! Instead of using a dull green color, I'll set the background as transparent.

Actions #13

Updated by Lucas Di Pentima 7 months ago

  • Status changed from In Progress to Resolved
Actions #14

Updated by Peter Amstutz 9 days ago

  • Release set to 70
Actions

Also available in: Atom PDF