Feature #18205
Updated by Peter Amstutz over 2 years ago
This applies to cloud installations only. Add a data structure to the container record that contains: * the total cost of a container (if the container is running, the cost up to this point in time) * the current hourly cost of that container, if it is not complete yet (after it is complete, it could be updated to the average hourly cost over the duration of the container) This data structure would be populated at a configurable `ContainerCostUpdateInterval` (default every 2 minutes?) for running containers, as well as immediately at the start and the end of a container. For AWS spot instances, we would have a goroutine that stores the relevant spot price history (DescribeSpotPriceHistory API call) for the configured availability zone in the database. Awkwardly we currently derive the availability zone from `Containers/CloudVMs/DriverParameters/SubnetID`, perhaps we should add an explicit value in the config file). The history would only be stored up to the oldest running job (and maybe while nothing is running, up to the current point in time?). It could be refreshed on the same frequency as `ContainerCostUpdateInterval`, but ideally we would stop updating when nothing is running. For regular AWS instances, we can get the hourly price for all node types in our AZ, API call TBD. This could be updated once a day (configurable?). For both spot and regular AWS instances, we also need to account for the cost of any extra attached EBS storage. API call TBD. As a bonus, or a follow-on, it would also be useful for the container request record to contain: * the total cost to run the process tree (sum of the container and all child containers) * the incremental cost to run the process tree (this is the previous sum excluding container requests that reused existing containers) This could be calculated by the API server or controller and added to the container request (maybe as a property) so that it is easily available for display in Workbench. Follow on work, outside the scope of this ticket: * Azure support for regular and spot instances * Costanalyzer support to use this cost data * Workbench2 support to show this cost data live * Gather interesting statistics for spot instances that could inform scheduling decisions in the future (e.g. how frequently certain node types get evicted, or the average spot price by instance type over time)