Project

General

Profile

Bug #20511

Updated by Peter Amstutz 12 months ago

I don't know how to interpret this but with arvados-dispatch-cloud running a large job (MaxInstances=400) I am seeing a trend of roughly two "aborted" instances for every "successful" instance (arvados_dispatchcloud_boot_outcomes metric) -- in other words the "aborted" line is growing twice as fast as the "successful" line. 

 I'm wondering if this is related to #20457 and some kind of churn at the top of the queue. 

 edit: I'm looking at the code and it looks like "aborted" might just mean the node was shut down intentionally, is that right?    (this feels like a bad choice of terminology since "aborted" is usually used to mean terminating from an error condition). 

 I'm trying to understand why the numbers are out of balance, shouldn't there be 1 shutdown for every 1 successful startup? 

Back