Feature #14285

[keep-balance] metrics endpoint

Added by Tom Clegg about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
10/11/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release:
Release relationship:
Auto

Description

Export the current storage status (currently being logged as formatted text) as prometheus metrics.


Subtasks

Task #14288: Review 14285-keep-balance-metricsResolvedLucas Di Pentima

Task #14357: Review 14285-keep-balance-metricsResolvedTom Clegg

Associated revisions

Revision f29266f2
Added by Tom Clegg about 1 year ago

Merge branch '14285-keep-balance-metrics'

refs #14285

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision d6a21ab8 (diff)
Added by Tom Clegg about 1 year ago

Fix missing namespace in metric names.

refs #14285

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 0afa1718
Added by Tom Clegg about 1 year ago

Merge branch '14285-keep-balance-metrics'

closes #14285

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg about 1 year ago

  • Target version set to 2018-10-17 sprint
  • Story points set to 1.0

#2 Updated by Tom Clegg about 1 year ago

14285-keep-balance-metrics @ 182ee5168acc0dee39653ba3b5f0b4903f8b8b7c
  • changes logging format to JSON (was timestamped plaintext)
  • adds backend storage metrics: keep_{garbage,lost,overreplicated,total,transient,underreplicated}_{blocks,bytes,replicas}
  • adds keep-balance timing metrics: keepbalance_{changeset_compute,get_state,send_pull_list,send_trash_lists,sweep}_seconds
  • updates doc page

TODO: require management token

#3 Updated by Tom Clegg about 1 year ago

  • Status changed from New to In Progress

#4 Updated by Lucas Di Pentima about 1 year ago

Should the ‘Listen’ config param be added to usage.go? The rest lgtm.

#5 Updated by Tom Clegg about 1 year ago

Metrics endpoints for keepstore, keep-web, keep-balance now require ManagementToken, if one is configured.

Added Listen and ManagementToken to usage.go.

14285-keep-balance-metrics @ 1f7a87ea4c342d16e0992872e2893cb6a2da92e9 https://ci.curoverse.com/view/Developer/job/developer-run-tests/928/

#7 Updated by Lucas Di Pentima about 1 year ago

A couple of questions:

  • I’m not understanding how LoadToken() comes into play with the rest of the code, it seems to not being called from anywhere, could you please explain?
  • Is sdk/go/auth getting big enough to deserve some testing?

#8 Updated by Tom Clegg about 1 year ago

Lucas Di Pentima wrote:

A couple of questions:

  • I’m not understanding how LoadToken() comes into play with the rest of the code, it seems to not being called from anywhere, could you please explain?

It's a middleware that attaches a *Credentials to the request, for CredentialsFromRequest can return when called (perhaps repeatedly) by inner handlers. The most likely use for it is in controller, but not until it gets cleaned up a bit.

  • Is sdk/go/auth getting big enough to deserve some testing?

Yes, added some handler tests. There's room for more, but how far should we go?

14285-keep-balance-metrics @ 7ad5d87a9e2d67224ed440c9320f1850cbaf6ae1

#9 Updated by Lucas Di Pentima about 1 year ago

Thanks for the explanation, I had the suspicion that it was for future use.
This LGTM, please merge.

#10 Updated by Tom Clegg about 1 year ago

14285-keep-balance-metrics @ faeb978ad985f983ec8dc18292524e0bd7bbda03
  • adds arvados_keep_dedup_block_ratio and ..._byte_ratio metrics
  • rough measure of storage cost savings as a result of content addressing
  • ratio=2.0 would mean the average block is referenced by two collections

#11 Updated by Lucas Di Pentima about 1 year ago

Just one question: Would it be convenient to use uin64 instead of int64 on collectionB.+s vars?
The rest LGTM.

#12 Updated by Tom Clegg about 1 year ago

Lucas Di Pentima wrote:

Just one question: Would it be convenient to use uin64 instead of int64 on collectionB.+s vars?

I'm inclined to use signed ints anywhere arithmetic might be done -- uints have weird properties like 0-1 > 0 that can produce subtle bugs later on.

#13 Updated by Tom Clegg about 1 year ago

  • Status changed from In Progress to Resolved

#14 Updated by Tom Morris about 1 year ago

  • Release set to 14

Also available in: Atom PDF