Feature #13025

[keepstore] runtime metrics endpoint

Added by Tom Clegg about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
02/08/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Start by exporting request time statistics using https://godoc.org/github.com/prometheus/client_golang/prometheus or similar.


Subtasks

Task #13027: Review 13025-keepstore-metricsResolvedTom Clegg


Related issues

Related to Arvados - Story #10484: [keepstore] Add S3 bucket statistics and current #clients to status.jsonResolved11/17/2016

Associated revisions

Revision 4d52d030
Added by Tom Clegg about 3 years ago

Merge branch '13025-keepstore-metrics'

refs #13025

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Morris about 3 years ago

  • Related to Story #10484: [keepstore] Add S3 bucket statistics and current #clients to status.json added

#2 Updated by Tom Morris about 3 years ago

Here are some of the background notes from previous discussions:

Not to dive into implementation details, but introducing a general purpose metrics package may provide benefits going forward.

I've used the Dropwizard metrics in the past and like them. A Go port of that package can be found here: https://github.com/armon/go-metrics

There's also: https://github.com/rcrowley/go-metrics which is described in this blog post: http://blog.dekstroza.io/monitoring-software-written-in-golang/

  • June 2017 email before status.json meeting:

Some background reading:

- metrics vs logs - https://grafana.com/blog/2016/01/05/logs-and-metrics-and-graphs-oh-my/
- Dropwizard metrics - http://metrics.dropwizard.io/3.2.2/
- Go port of above - https://github.com/rcrowley/go-metrics
- Coda Hale's (author of Dropwizard) Go metrics package - https://github.com/codahale/metrics
- Prometheus - https://prometheus.io/docs/introduction/comparison/

I think we have two different things that we're interested here:
1. Health checks
2. Metrics

We should be sure to distinguish between them and understand which we're talking about at any given time.

It seems like some of the metrics from #10484 are candidates to get consolidated here, but probably best to focus on additional metrics first.

#3 Updated by Tom Clegg about 3 years ago

  • Status changed from New to In Progress

#4 Updated by Tom Clegg about 3 years ago

[{
  "name": "request_duration_seconds",
  "help": "Summary of request duration.",
  "type": "SUMMARY",
  "metric": [
    {
      "label": [
        {
          "name": "code",
          "value": "200" 
        },
        {
          "name": "method",
          "value": "get" 
        } 
      ],
      "summary": {
        "sample_count": "1",
        "sample_sum": 0.000342534,
        "quantile": [
          {
            "quantile": 0.5,
            "value": 0.000342534
          },
          {
            "quantile": 0.9,
            "value": 0.000342534
          },
          {
            "quantile": 0.99,
            "value": 0.000342534
          } 
        ] 
      }
    },
    { 
      "label": [
        { 
          "name": "code",
          "value": "200" 
        },
        { 
          "name": "method",
          "value": "put" 
        }
      ],
      "summary": {
        "sample_count": "9",
        "sample_sum": 0.258729368,
        "quantile": [
          { 
            "quantile": 0.5,
            "value": 0.023052607
          },
          {
            "quantile": 0.9,
            "value": 0.072060673
          },
          {
            "quantile": 0.99,
            "value": 0.072060673
          }
        ]
      }
    }
  ]
}]

#5 Updated by Tom Clegg about 3 years ago

13025-keepstore-metrics @ 85398442d095524c5b2f315c294ea81f9d17853b

This adds "request_duration_seconds" and "time_to_status_seconds" summary metrics.

I haven't added s3 and Azure backend stats yet, but I don't think that's a reason to delay merging these basic stats.

#6 Updated by Lucas Di Pentima about 3 years ago

  • Some tests are failing: https://ci.curoverse.com/job/developer-run-tests/573/
  • File services/keepstore/config.go
    • Line 99: I think the ‘else’ clause is not needed.
    • Line 197: Can ‘req’ param be replaced by _ on exportJSON()’s signature if it’s not going to be used?
  • Does this feature need some sort of documentation?

#7 Updated by Tom Clegg about 3 years ago

Lucas Di Pentima wrote:

Aha... keep-web was counting on WroteStatus() to return 0 when there has been no explicit call to WriteHeader(). Reverted that change and fixed the status=0 logging bug at log-emitting time instead.

  • File services/keepstore/config.go
    • Line 99: I think the ‘else’ clause is not needed.

But without the else, log.Formatter wouldn't get set...? (I rearranged this to put the "else" case in the function body, as recommended by golint.)

  • Line 197: Can ‘req’ param be replaced by _ on exportJSON()’s signature if it’s not going to be used?

Can be, but even go vet and golint seem ok with it... :)

  • Does this feature need some sort of documentation?

Sure, although we might try it for a bit before we advertise it.

13025-keepstore-metrics @ 3826a6339ba1c901c054053920ed20547b3ba54d

https://ci.curoverse.com/job/developer-run-tests/577/

#8 Updated by Lucas Di Pentima about 3 years ago

LGTM, thanks!

#9 Updated by Tom Clegg about 3 years ago

  • Status changed from In Progress to Resolved

#10 Updated by Tom Morris over 2 years ago

  • Release set to 17

Also available in: Atom PDF