Project

General

Profile

Actions

Bug #14920

closed

[crunch-dispatch-cloud] New Azure instances always have state=unknown instead of state=booting

Added by Tom Clegg almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release relationship:
Auto

Description

Currently, when a-d-c uses the Azure driver, new instances have state=unknown (instead of the expected state=booting) until the boot/run probes pass.

The "unknown" state is intended to cover the case where the "list instances" call returns a previously unseen instance ID. In the Azure case, the "create VM" call does not even return the ID of the newly created instance until the instance has finished booting, so until then, the dispatcher's worker pool doesn't recognize that it corresponds to an outstanding "create" call.

Some different ways to address this:
  • In the Azure driver, return as soon as the instance ID is known, instead of waiting for it to boot. This is how the driver is expected to work, but the Azure client library might not make it easy.
  • In the worker pool, when an unexpected instance ID appears, check whether its "secret token" tag matches an outstanding Create call. This would also cover the "list returns before create" race, which applies to all drivers.

The second option seems better.

It would also be worth documenting the expected driver behavior in the driver interface definition: Create() should generally return as soon as the new instance's ID is known, but must not return so early that a subsequent call to Instances() might not include the new instance.


Files

14920-fixed.png (20 KB) 14920-fixed.png Tom Clegg, 03/07/2019 07:38 PM

Subtasks 1 (0 open1 closed)

Task #14928: review 14920-unknown-booting-raceResolvedWard Vandewege03/07/2019Actions
Actions #1

Updated by Tom Clegg almost 6 years ago

  • Category set to Crunch
  • Target version set to To Be Groomed
Actions #2

Updated by Tom Clegg almost 6 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg almost 6 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
  • Target version changed from To Be Groomed to Arvados Future Sprints

14920-unknown-booting-race @ e49978c5d9bece2a1db646f36cdf346414dd8813

Actions #4

Updated by Ward Vandewege almost 6 years ago

Tom Clegg wrote:

14920-unknown-booting-race @ e49978c5d9bece2a1db646f36cdf346414dd8813

LGTM. I like that the code is a lot more elegant now!

Actions #5

Updated by Tom Clegg almost 6 years ago

I noticed while testing that metrics didn't reflect the idle→running change right away when starting a container. With that fixed:

Actions #6

Updated by Tom Clegg almost 6 years ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by Tom Morris almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2019-03-13 Sprint
Actions #8

Updated by Tom Morris over 5 years ago

  • Release set to 15
Actions

Also available in: Atom PDF