Feature #15340
[arvados-dispatch-cloud] Error-counting metrics
Start date:
Due date:
% Done:
100%
Estimated time:
Story points:
1.0
Release:
Release relationship:
Auto
Description
Add to prometheus metrics:
counter vectorarvados_dispatchcloud_driver_operations
- number of cloud operations, split by operation type (op=Create/Destroy/List/SetTags) and result (error=0/1)
- can be implemented as a driver proxy similar to rateLimitedInstanceSet in source:lib/dispatchcloud/driver.go
- most likely usage in graphs/alerts is
arvados_dispatchcloud_driver_operations{error=1}
arvados_dispatchcloud_instances_disappeared
- number of times an instance disappeared in cloud (see sync() in source:lib/dispatchcloud/worker/pool.go), split by state
- most likely usage in graphs/alerts is
arvados_dispatchcloud_instances_disappeared{state!="shutdown"}
Related issues
Associated revisions
Merge branch '15340-error-counters'
refs #15340
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>
History
#1
Updated by Tom Clegg over 1 year ago
- Blocks Story #13908: [Epic] Replace SLURM for cloud job scheduling/dispatching added
#2
Updated by Tom Morris over 1 year ago
- Target version set to Arvados Future Sprints
- Story points set to 1.0
#3
Updated by Tom Clegg over 1 year ago
- Description updated (diff)
#4
Updated by Tom Clegg over 1 year ago
#5
Updated by Ward Vandewege over 1 year ago
Tom Clegg wrote:
15340-error-counters @ 42966c194493f8e42e26e3d64880e5c93a9c3251 -- https://ci.curoverse.com/view/Developer/job/developer-run-tests/1312/
Assuming there are no further test failures in https://ci.curoverse.com/view/Developer/job/developer-run-tests/1314/, 15340-error-counters @ 92d72b35447cf7210728725b217812492b3855e0 LGTM
#6
Updated by Tom Clegg over 1 year ago
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
#7
Updated by Tom Clegg over 1 year ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|8b9fef1bf288427d6581a229c2663a96915501b2.
#8
Updated by Ward Vandewege over 1 year ago
- Target version changed from Arvados Future Sprints to 2019-06-19 Sprint
#9
Updated by Peter Amstutz about 1 year ago
- Release set to 22
Merge branch '15340-error-counters'
closes #15340
Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tclegg@veritasgenetics.com>