Project

General

Profile

Actions

Feature #16636

closed

[arvados-dispatch-cloud] Add instance metrics

Added by Tom Clegg almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0
Release relationship:
Auto

Description

From Dispatching containers to cloud VMs

* †(gauge) number of containers allocated to VMs but not started yet (because VMs are pending/booting) (implemented)
* †(gauge) number of containers not allocated to VMs (because provider quota is reached) (implemented)
* †(summary) time elapsed between VM creation and first successful SSH connection to that VM (implemented)
* †(summary) time elapsed between first successful SSH connection on a VM and ready to run a container on that VM (implemented)
* †(counter) VMs that have either become ready or reached boot timeout, partitioned by success/timeout (implemented)
* †(summary) time elapsed between first shutdown attempt on a VM and its disappearance from the provider listing (implemented)
* †(summary) wait times (between seeing a container in the queue or requeueing, and starting its crunch-run process on a worker) across previous starts (implemented)
* †(gauge) longest wait time of any unstarted container (implemented)

(wiki has one more unimplemented metric which has its own issue, #15865)


Subtasks 4 (0 open4 closed)

Task #16661: review 16636-boot-outcome-metricsResolvedTom Clegg08/03/2020Actions
Task #16806: review 16636-add-time-to-ssh-metricResolvedWard Vandewege09/03/2020Actions
Task #16829: review 16636-container-allocation-metricsResolvedWard Vandewege08/03/2020Actions
Task #16837: review 16636-more-metricsResolvedWard Vandewege09/14/2020Actions

Related issues

Related to Arvados - Idea #15865: [arvados-dispatch-cloud] Cumulative instance time and cost metricsNewActions
Related to Arvados - Idea #13908: [Epic] Replace SLURM for cloud job scheduling/dispatchingResolvedActions
Related to Arvados - Feature #16838: [a-d-c] probe metricsResolvedWard Vandewege09/18/2020Actions
Related to Arvados - Feature #17185: [adc] add broken node metricsNewTom CleggActions
Actions

Also available in: Atom PDF