Story #13921

Log stats for the entire node lifecycle

Added by Bryan Cosca 6 months ago. Updated 19 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

The original ticket expressed the need in terms of container requests, but I think we've got that piece reasonable well covered with our existing logging, so I'd like to repurpose this to do logging of stats from the point of view of the Node, not the Container.

When a node is shut down, record how long it was up as well as times for: booting, downloading docker images, running docker images, and idle.

This should allow system administrators to monitor overall performance, efficiency, and cost of the system orthogonally to jobs being run (which we account for through the container logs).

For billing purposes, we want to know the time between when the node is up and ready to be used and when crunch-run is running, as well as when crunch-run has stopped and when the node is no longer available (or has moved on to another container_request)

History

#1 Updated by Tom Morris 19 days ago

  • Target version set to To Be Groomed
  • Description updated (diff)
  • Subject changed from Log the entire node lifecycle for a container_request to Log stats for the entire node lifecycle

Also available in: Atom PDF