Bug #19967
closedcrunch-run periodically updates container cost
Description
Currently the container cost is recorded as 0 in this case.
This should ensure the cost of a cancel-retry-cancel-retry-cancel-fail cycle is reflected properly.
Have crunch-run periodically update the price while the container is running, perhaps each time it gets a SIGUSR2 after loadPrices (i.e., on each dispatcher probe).
Related issues
Updated by Peter Amstutz almost 2 years ago
- Target version changed from Future to To be scheduled
Updated by Peter Amstutz over 1 year ago
- Target version changed from To be scheduled to 2023-03-01 sprint
Updated by Peter Amstutz over 1 year ago
- Target version changed from 2023-03-01 sprint to Development 2023-03-15 sprint
Updated by Tom Clegg over 1 year ago
A proposed approach:
Part 1 is to have the dispatcher calculate cost when cancelling a container whose process (or entire instance) has disappeared. Normally the "container cleanup" stuff is done by lib/dispatchcloud/scheduler, which hits the cancel button here:
} else if !exited.IsZero() && qUpdated.After(exited) {
go sch.cancel(uuid, "state=Running after crunch-run exited")
This could stash cost-so-far while updating state to Cancelled. The only hitch is that scheduler doesn't know anything about prices.
(An alternative would be for the worker pool update the cost itself, in closeRunner() -- but I think it gets complicated quickly when we consider retrying failures, and races between "update cost field" (in worker pool) and "set state to cancelled" (scheduler).)
I'm thinking we need to update the Running() call in scheduler/interfaces.go so that it returns a map[string]RunStatus, where RunStatus is something like:
type RunStatus struct {
Exited bool
ExitTime time.Time
Cost float64
}
...where Cost>0 means the scheduler should use that as the final cost when it sets state=Cancelled.
(I'm sneaking in the separate Exited bool here to be more clear/explicit than the current "zero time means not exited" approach. Not strictly necessary, but might as well, if we're going to make it a struct, right?)
Worker pool can populate that with calculated cost if we- move most of the calculateCost() code from lib/crunchrun to lib/cloud, and
- in lib/worker, use NormalizePriceHistory to combine/remember the entire cloud-provided price history of each worker instead of just sending the latest data points to crunch-run and forgetting them.
Part 2 is to have crunch-run periodically update the price while the container is running, perhaps each time it gets a SIGUSR2 after loadPrices (i.e., on each dispatcher probe). This would be an easier way to accomplish nearly the same thing, and has the additional benefit of showing the cost-so-far for long running containers instead of showing $0 until the end.
It would be ideal to have both parts, but I wonder if we should do just Part 2, and get 90% benefit for 10% work.
Updated by Tom Clegg over 1 year ago
Tom Clegg wrote in #note-8:
I'm thinking we need to update the Running() call in scheduler/interfaces.go so that it returns a map[string]RunStatus, where RunStatus is something like:
On second thought:
type RunStatus interface {
Exited() bool
ExitTime() time.Time
Cost() float64
}
That way, worker pool can implement a concrete runStatus type with a *worker field and a Cost method that does the calculation, instead of computing all of the costs on every call to Running() only to have them unused/ignored nearly all of the time.
Updated by Brett Smith over 1 year ago
Tom Clegg wrote in #note-8:
It would be ideal to have both parts, but I wonder if we should do just Part 2, and get 90% benefit for 10% work.
Discussed at standup. We will start with option 2, both because it's less work, and it also has other benefits like showing providing cost information for long-running containers. We can lose 10ish minutes of cost reporting in the default configuration, but we're okay with that now. Creating a follow-up ticket for option 1.
Updated by Brett Smith over 1 year ago
- Precedes Idea #20239: Dispatcher calculates final container cost after runner/instance disappears added
Updated by Brett Smith over 1 year ago
- Subject changed from Record cost when dispatcher cancels a running container due to instance disappearance to crunch-run periodically updates container cost
Updated by Peter Amstutz over 1 year ago
- Target version changed from Development 2023-03-15 sprint to Development 2023-03-29 Sprint
Updated by Brett Smith over 1 year ago
- Status changed from New to In Progress
Tom Clegg wrote in #note-8:
Part 2 is to have crunch-run periodically update the price while the container is running, perhaps each time it gets a SIGUSR2 after loadPrices (i.e., on each dispatcher probe).
Implemented in 19967-crunch-run-cost-updates @ d1dcb64a9174c2da64306a6598e1e52a96344e6b - developer-run-tests: #3554
Updated by Lucas Di Pentima over 1 year ago
Sorry for the delay, I wasn't familiarized with that part of the code. This LGTM, thanks!
Updated by Brett Smith over 1 year ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|771852c25cba5d8da9de56f3503ae0606d817c8c.