Feature #17185

[adc] add broken node metrics

Added by Ward Vandewege over 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Add a broken-node metric

(counter) VMs that are determined to be "broken nodes"

Add a label to separate VMs marked as broken before the first container is started on them (likely boot problem) and after (likely container related problem).

Note that we already have a boot outcome metric. Make sure that we increment the broken node counter ("before first container" label) when we have a boot outcome == failed, though not in the timeout case.


Subtasks

Task #17554: ReviewNew


Related issues

Related to Arvados - Feature #16636: [arvados-dispatch-cloud] Add instance metricsResolved08/03/2020

Related to Arvados - Bug #17186: [dispatch] broken node logs should also be copied to a-d-c logsNew

History

#1 Updated by Ward Vandewege over 1 year ago

  • Related to Feature #16636: [arvados-dispatch-cloud] Add instance metrics added

#2 Updated by Ward Vandewege over 1 year ago

  • Description updated (diff)

#3 Updated by Ward Vandewege over 1 year ago

  • Related to Bug #17186: [dispatch] broken node logs should also be copied to a-d-c logs added

#4 Updated by Ward Vandewege over 1 year ago

  • Description updated (diff)

#5 Updated by Tom Clegg about 1 year ago

  • Target version set to 2021-04-28 bughunt sprint
  • Assigned To set to Tom Clegg
  • Category set to Crunch

#6 Updated by Peter Amstutz about 1 year ago

  • Target version deleted (2021-04-28 bughunt sprint)

Also available in: Atom PDF