Idea #13198
closed[Keep-web] Add metrics endpoint
Description
Use same approach as keepstore metrics added in #13025 (prometheus, etc).
Easiest metrics to provide:- reqDuration (partitioned by method and status) using promhttp.InstrumentHandlerDuration
- timeToStatus (ditto) using log.AddHook, as in #13025
This should be refactored into a go package (sdk/go/httpserver?) instead of copying code from keepstore to keep-web.
Keep-web specific metrics to provide:- time to fetch block from keep
- cache hits, misses
Updated by Tom Morris almost 7 years ago
- Subject changed from [Keep-web] Add track and report metrics through monitoring interface to [Keep-web]Track and report metrics through monitoring interface
Updated by Tom Clegg over 6 years ago
- Description updated (diff)
- Subject changed from [Keep-web]Track and report metrics through monitoring interface to [Keep-web] Add metrics endpoint
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
Updated by Tom Morris over 6 years ago
- Assigned To set to Tom Clegg
- Target version changed from Arvados Future Sprints to 2018-07-18 Sprint
Updated by Peter Amstutz over 6 years ago
On the topic of metrics and health checks, we should add a page to the "Admin" section of documentation that describes which components have endpoints and how to use the health check aggregator. That would address the situation where no one remembers where we are with the project of implementing health checks / metrics, at least it would be written down.
Updated by Peter Amstutz over 6 years ago
Documenting health checks / metrics story: #13791
Updated by Tom Clegg over 6 years ago
13198-keep-web-metrics @ 0011b5236fc9a562bc13f943f9a431c496b2b7cd https://ci.curoverse.com/job/developer-run-tests/815/
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
Updated by Lucas Di Pentima over 6 years ago
Tried manually and all seems to work great.
One question though: Are the "time to fetch block" and "cache hits/misses" going to be implemented later / discarded? If yes, then it LGTM.
Updated by Tom Clegg over 6 years ago
Right, this branch (just merged) only offers request timing.
Keeping issue open for the keep-web-specific metrics.
Updated by Tom Clegg over 6 years ago
13198-keep-web-metrics @ 413db07b4c81ea08663f90f31ee03227349d2be4 https://ci.curoverse.com/view/All/job/developer-run-tests/832/
This doesn't have "time to fetch block". It just exports the cache metrics we were already collecting (and exporting in status.json) and exports them as prometheus counters/gauges.
Updated by Lucas Di Pentima over 6 years ago
This LGTM, but it I think it would be nice to avoid computing increments twice for every counter, couldn't /status.json
be some sort of prometheus client?
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint
Updated by Lucas Di Pentima over 6 years ago
Updates at 6db4c94527903f403e77ba4fce2d1d5fe4e29b03 LGTM.
Updated by Tom Clegg over 6 years ago
- adds keep-web section to https://doc.arvados.org/admin/metrics.html
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-08-15 Sprint to 2018-09-05 Sprint
Updated by Lucas Di Pentima over 6 years ago
Just one comment:
- I noticed that the documentation style changed with these new additions. All other items are listed in a tabular way instead of bullet point lists, I think the tabular way is clearer to read.
Other than that, lgtm.
Updated by Tom Clegg over 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|789e15f578cf4464834ecae347fdfe0d337b7464.