Story #15865

[arvados-dispatch-cloud] Cumulative instance time and cost metrics

Added by Tom Clegg 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

From Dispatching containers to cloud VMs

  • (counter) cumulative instance time and cost, partitioned by allocation state and node type
In principle, these metrics can be estimated based on the existing instances_total and instances_price metrics. However:
  • prometheus doesn't offer an integral function
  • updating a cumulative metric after every worker/pool update (i.e., in updateMetrics()) would provide decent accuracy even when prometheus sampling is infrequent/unreliable.

Related issues

Related to Arvados - Story #13908: [Epic] Replace SLURM for cloud job scheduling/dispatchingNew

History

#1 Updated by Tom Clegg 5 months ago

  • Related to Story #13908: [Epic] Replace SLURM for cloud job scheduling/dispatching added

#2 Updated by Tom Clegg 5 months ago

example

instance_usage_cost{category="inuse",instance_type="z1.xxl"} 1.234
instance_usage_seconds{category="inuse",instance_type="z1.xxl"} 3600

Also available in: Atom PDF