Feature #15340

[arvados-dispatch-cloud] Error-counting metrics

Added by Tom Clegg about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Story points:
1.0
Release relationship:
Auto

Description

Add to prometheus metrics:

counter vector arvados_dispatchcloud_driver_operations
  • number of cloud operations, split by operation type (op=Create/Destroy/List/SetTags) and result (error=0/1)
  • can be implemented as a driver proxy similar to rateLimitedInstanceSet in source:lib/dispatchcloud/driver.go
  • most likely usage in graphs/alerts is arvados_dispatchcloud_driver_operations{error=1}
counter vector arvados_dispatchcloud_instances_disappeared
  • number of times an instance disappeared in cloud (see sync() in source:lib/dispatchcloud/worker/pool.go), split by state
  • most likely usage in graphs/alerts is arvados_dispatchcloud_instances_disappeared{state!="shutdown"}

Related issues

Blocks Arvados - Story #13908: [Epic] Replace SLURM for cloud job scheduling/dispatchingNew

Associated revisions

Revision 8b9fef1b
Added by Tom Clegg about 1 year ago

Merge branch '15340-error-counters'

closes #15340

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

Revision 2f4429e6
Added by Tom Clegg about 1 year ago

Merge branch '15340-error-counters'

refs #15340

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg about 1 year ago

  • Blocks Story #13908: [Epic] Replace SLURM for cloud job scheduling/dispatching added

#2 Updated by Tom Morris about 1 year ago

  • Target version set to Arvados Future Sprints
  • Story points set to 1.0

#3 Updated by Tom Clegg about 1 year ago

  • Description updated (diff)

#6 Updated by Tom Clegg about 1 year ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg

#7 Updated by Tom Clegg about 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

#8 Updated by Ward Vandewege about 1 year ago

  • Target version changed from Arvados Future Sprints to 2019-06-19 Sprint

#9 Updated by Peter Amstutz 7 months ago

  • Release set to 22

Also available in: Atom PDF