Feature #17185
[adc] add broken node metrics
Start date:
Due date:
% Done:
0%
Estimated time:
(Total: 0.00 h)
Story points:
-
Description
Add a broken-node metric
(counter) VMs that are determined to be "broken nodes"
Add a label to separate VMs marked as broken before the first container is started on them (likely boot problem) and after (likely container related problem).
Note that we already have a boot outcome metric. Make sure that we increment the broken node counter ("before first container" label) when we have a boot outcome == failed, though not in the timeout case.
Subtasks
Related issues
History
#1
Updated by Ward Vandewege over 1 year ago
- Related to Feature #16636: [arvados-dispatch-cloud] Add instance metrics added
#2
Updated by Ward Vandewege over 1 year ago
- Description updated (diff)
#3
Updated by Ward Vandewege over 1 year ago
- Related to Bug #17186: [dispatch] broken node logs should also be copied to a-d-c logs added
#4
Updated by Ward Vandewege over 1 year ago
- Description updated (diff)
#5
Updated by Tom Clegg about 1 year ago
- Target version set to 2021-04-28 bughunt sprint
- Assigned To set to Tom Clegg
- Category set to Crunch
#6
Updated by Peter Amstutz about 1 year ago
- Target version deleted (
2021-04-28 bughunt sprint)