Project

General

Profile

Actions

Feature #17185

open

[adc] add broken node metrics

Added by Ward Vandewege over 3 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Add a broken-node metric

(counter) VMs that are determined to be "broken nodes"

Add a label to separate VMs marked as broken before the first container is started on them (likely boot problem) and after (likely container related problem).

Note that we already have a boot outcome metric. Make sure that we increment the broken node counter ("before first container" label) when we have a boot outcome == failed, though not in the timeout case.


Subtasks 1 (1 open0 closed)

Task #17554: ReviewNewActions

Related issues

Related to Arvados - Feature #16636: [arvados-dispatch-cloud] Add instance metricsResolvedWard Vandewege08/03/2020Actions
Related to Arvados - Bug #17186: [dispatch] broken node logs should also be copied to a-d-c logsNewActions
Actions

Also available in: Atom PDF