Story #13198

[Keep-web] Add metrics endpoint

Added by Tom Morris over 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
07/17/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5
Release:
Release relationship:
Auto

Description

Use same approach as keepstore metrics added in #13025 (prometheus, etc).

Easiest metrics to provide:
  • reqDuration (partitioned by method and status) using promhttp.InstrumentHandlerDuration
  • timeToStatus (ditto) using log.AddHook, as in #13025

This should be refactored into a go package (sdk/go/httpserver?) instead of copying code from keepstore to keep-web.

Keep-web specific metrics to provide:
  • time to fetch block from keep
  • cache hits, misses

Subtasks

Task #13733: Review 13198-keep-web-metricsResolvedTom Clegg

Task #13877: Review 13198-keep-web-metricsResolvedTom Clegg

Task #13942: Remove metrics from status.jsonResolvedTom Clegg

Task #13943: Update https://doc.arvados.org/admin/metrics.htmlResolvedTom Clegg

Task #13983: Review 13198-keep-web-metricsResolvedTom Clegg

Task #14035: Review 13198-metrics-docsResolvedTom Clegg

Associated revisions

Revision d6e1bfee
Added by Tom Clegg almost 2 years ago

Merge branch '13198-keep-web-metrics'

refs #13198

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 0be45d32
Added by Tom Clegg almost 2 years ago

Merge branch '13198-keep-web-metrics'

refs #13198

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 4c5600a8
Added by Tom Clegg almost 2 years ago

Merge branch '13198-keep-web-metrics'

refs #13198

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 789e15f5
Added by Tom Clegg almost 2 years ago

Merge branch '13198-metrics-docs'

closes #13198

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Morris over 2 years ago

  • Subject changed from [Keep-web] Add track and report metrics through monitoring interface to [Keep-web]Track and report metrics through monitoring interface

#2 Updated by Tom Clegg about 2 years ago

  • Description updated (diff)
  • Subject changed from [Keep-web]Track and report metrics through monitoring interface to [Keep-web] Add metrics endpoint

#3 Updated by Tom Morris about 2 years ago

  • Story points set to 3.0

#4 Updated by Tom Morris about 2 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints

#5 Updated by Tom Morris about 2 years ago

  • Assigned To set to Tom Clegg
  • Target version changed from Arvados Future Sprints to 2018-07-18 Sprint

#6 Updated by Peter Amstutz almost 2 years ago

On the topic of metrics and health checks, we should add a page to the "Admin" section of documentation that describes which components have endpoints and how to use the health check aggregator. That would address the situation where no one remembers where we are with the project of implementing health checks / metrics, at least it would be written down.

#7 Updated by Peter Amstutz almost 2 years ago

Documenting health checks / metrics story: #13791

#9 Updated by Tom Clegg almost 2 years ago

  • Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint

#10 Updated by Tom Clegg almost 2 years ago

  • Story points changed from 3.0 to 0.5

#11 Updated by Tom Clegg almost 2 years ago

  • Status changed from New to In Progress

#12 Updated by Lucas Di Pentima almost 2 years ago

Tried manually and all seems to work great.
One question though: Are the "time to fetch block" and "cache hits/misses" going to be implemented later / discarded? If yes, then it LGTM.

#13 Updated by Tom Clegg almost 2 years ago

Right, this branch (just merged) only offers request timing.

Keeping issue open for the keep-web-specific metrics.

#14 Updated by Tom Clegg almost 2 years ago

13198-keep-web-metrics @ 413db07b4c81ea08663f90f31ee03227349d2be4 https://ci.curoverse.com/view/All/job/developer-run-tests/832/

This doesn't have "time to fetch block". It just exports the cache metrics we were already collecting (and exporting in status.json) and exports them as prometheus counters/gauges.

#15 Updated by Lucas Di Pentima almost 2 years ago

This LGTM, but it I think it would be nice to avoid computing increments twice for every counter, couldn't /status.json be some sort of prometheus client?

#16 Updated by Tom Clegg almost 2 years ago

  • Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint

#19 Updated by Tom Clegg almost 2 years ago

  • Target version changed from 2018-08-15 Sprint to 2018-09-05 Sprint

#20 Updated by Lucas Di Pentima almost 2 years ago

Just one comment:

  • I noticed that the documentation style changed with these new additions. All other items are listed in a tabular way instead of bullet point lists, I think the tabular way is clearer to read.

Other than that, lgtm.

#21 Updated by Tom Clegg almost 2 years ago

  • Status changed from In Progress to Resolved

#22 Updated by Ward Vandewege almost 2 years ago

  • Release set to 13

Also available in: Atom PDF