Project

General

Profile

Actions

Idea #13198

closed

[Keep-web] Add metrics endpoint

Added by Tom Morris over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/17/2018
Due date:
Story points:
0.5
Release:
Release relationship:
Auto

Description

Use same approach as keepstore metrics added in #13025 (prometheus, etc).

Easiest metrics to provide:
  • reqDuration (partitioned by method and status) using promhttp.InstrumentHandlerDuration
  • timeToStatus (ditto) using log.AddHook, as in #13025

This should be refactored into a go package (sdk/go/httpserver?) instead of copying code from keepstore to keep-web.

Keep-web specific metrics to provide:
  • time to fetch block from keep
  • cache hits, misses

Subtasks 6 (0 open6 closed)

Task #13733: Review 13198-keep-web-metricsResolvedTom Clegg07/17/2018Actions
Task #13877: Review 13198-keep-web-metricsResolvedTom Clegg07/17/2018Actions
Task #13942: Remove metrics from status.jsonResolvedTom Clegg07/17/2018Actions
Task #13943: Update https://doc.arvados.org/admin/metrics.htmlResolvedTom Clegg07/17/2018Actions
Task #13983: Review 13198-keep-web-metricsResolvedTom Clegg07/17/2018Actions
Task #14035: Review 13198-metrics-docsResolvedTom Clegg07/17/2018Actions
Actions #1

Updated by Tom Morris over 6 years ago

  • Subject changed from [Keep-web] Add track and report metrics through monitoring interface to [Keep-web]Track and report metrics through monitoring interface
Actions #2

Updated by Tom Clegg over 6 years ago

  • Description updated (diff)
  • Subject changed from [Keep-web]Track and report metrics through monitoring interface to [Keep-web] Add metrics endpoint
Actions #3

Updated by Tom Morris over 6 years ago

  • Story points set to 3.0
Actions #4

Updated by Tom Morris over 6 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
Actions #5

Updated by Tom Morris over 6 years ago

  • Assigned To set to Tom Clegg
  • Target version changed from Arvados Future Sprints to 2018-07-18 Sprint
Actions #6

Updated by Peter Amstutz over 6 years ago

On the topic of metrics and health checks, we should add a page to the "Admin" section of documentation that describes which components have endpoints and how to use the health check aggregator. That would address the situation where no one remembers where we are with the project of implementing health checks / metrics, at least it would be written down.

Actions #7

Updated by Peter Amstutz over 6 years ago

Documenting health checks / metrics story: #13791

Actions #9

Updated by Tom Clegg over 6 years ago

  • Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
Actions #10

Updated by Tom Clegg over 6 years ago

  • Story points changed from 3.0 to 0.5
Actions #11

Updated by Tom Clegg over 6 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Lucas Di Pentima over 6 years ago

Tried manually and all seems to work great.
One question though: Are the "time to fetch block" and "cache hits/misses" going to be implemented later / discarded? If yes, then it LGTM.

Actions #13

Updated by Tom Clegg over 6 years ago

Right, this branch (just merged) only offers request timing.

Keeping issue open for the keep-web-specific metrics.

Actions #14

Updated by Tom Clegg over 6 years ago

13198-keep-web-metrics @ 413db07b4c81ea08663f90f31ee03227349d2be4 https://ci.curoverse.com/view/All/job/developer-run-tests/832/

This doesn't have "time to fetch block". It just exports the cache metrics we were already collecting (and exporting in status.json) and exports them as prometheus counters/gauges.

Actions #15

Updated by Lucas Di Pentima over 6 years ago

This LGTM, but it I think it would be nice to avoid computing increments twice for every counter, couldn't /status.json be some sort of prometheus client?

Actions #16

Updated by Tom Clegg over 6 years ago

  • Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint
Actions #19

Updated by Tom Clegg over 6 years ago

  • Target version changed from 2018-08-15 Sprint to 2018-09-05 Sprint
Actions #20

Updated by Lucas Di Pentima over 6 years ago

Just one comment:

  • I noticed that the documentation style changed with these new additions. All other items are listed in a tabular way instead of bullet point lists, I think the tabular way is clearer to read.

Other than that, lgtm.

Actions #21

Updated by Tom Clegg over 6 years ago

  • Status changed from In Progress to Resolved
Actions #22

Updated by Ward Vandewege about 6 years ago

  • Release set to 13
Actions

Also available in: Atom PDF