Bug #14920

[crunch-dispatch-cloud] New Azure instances always have state=unknown instead of state=booting

Added by Tom Clegg over 2 years ago. Updated over 2 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)
Story points:
Release relationship:


Currently, when a-d-c uses the Azure driver, new instances have state=unknown (instead of the expected state=booting) until the boot/run probes pass.

The "unknown" state is intended to cover the case where the "list instances" call returns a previously unseen instance ID. In the Azure case, the "create VM" call does not even return the ID of the newly created instance until the instance has finished booting, so until then, the dispatcher's worker pool doesn't recognize that it corresponds to an outstanding "create" call.

Some different ways to address this:
  • In the Azure driver, return as soon as the instance ID is known, instead of waiting for it to boot. This is how the driver is expected to work, but the Azure client library might not make it easy.
  • In the worker pool, when an unexpected instance ID appears, check whether its "secret token" tag matches an outstanding Create call. This would also cover the "list returns before create" race, which applies to all drivers.

The second option seems better.

It would also be worth documenting the expected driver behavior in the driver interface definition: Create() should generally return as soon as the new instance's ID is known, but must not return so early that a subsequent call to Instances() might not include the new instance.

14920-fixed.png (20 KB) 14920-fixed.png Tom Clegg, 03/07/2019 07:38 PM


Task #14928: review 14920-unknown-booting-raceResolvedWard Vandewege

Associated revisions

Revision 64e72e28
Added by Tom Clegg over 2 years ago

Merge branch '14920-unknown-booting-race'

fixes #14920

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>


#1 Updated by Tom Clegg over 2 years ago

  • Category set to Crunch
  • Target version set to To Be Groomed

#2 Updated by Tom Clegg over 2 years ago

  • Description updated (diff)

#3 Updated by Tom Clegg over 2 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
  • Target version changed from To Be Groomed to Arvados Future Sprints

14920-unknown-booting-race @ e49978c5d9bece2a1db646f36cdf346414dd8813

#4 Updated by Ward Vandewege over 2 years ago

Tom Clegg wrote:

14920-unknown-booting-race @ e49978c5d9bece2a1db646f36cdf346414dd8813

LGTM. I like that the code is a lot more elegant now!

#5 Updated by Tom Clegg over 2 years ago

I noticed while testing that metrics didn't reflect the idle→running change right away when starting a container. With that fixed:

#6 Updated by Tom Clegg over 2 years ago

  • Status changed from In Progress to Resolved

#7 Updated by Tom Morris over 2 years ago

  • Target version changed from Arvados Future Sprints to 2019-03-13 Sprint

#8 Updated by Tom Morris over 2 years ago

  • Release set to 15

Also available in: Atom PDF