Bug #9018


[Node manager] exception handler should not kill parent process

Added by Tom Clegg over 8 years ago. Updated about 8 years ago.

Assigned To:
Node Manager
Target version:
Story points:


A race condition in test_fatal_error (tests.test_failure.ActorUnhandledExceptionTest) causes os.killpg() to be called after it has been unstubbed. This kills the test suite and

There are two problems here:
  • The test should not have a race condition
  • The exception handler should only kill node manager itself, not other processes.

Proposed fix for overkill

Use os._exit() or os.kill(0,9) instead of os.killpg()

Proposed fix for test race


Actions #1

Updated by Tom Clegg over 8 years ago

  • Description updated (diff)
  • Category set to Node Manager
Actions #2

Updated by Brett Smith about 8 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Peter Amstutz about 8 years ago

  • Target version changed from Arvados Future Sprints to 2016-05-25 sprint
Actions #4

Updated by Peter Amstutz about 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:aea5300167770beb3cca6ad90e5ebb04da961416.

Actions #5

Updated by Tom Clegg about 8 years ago

The test race might still exist. However, it hasn't been seen recently, so maybe some other changes have fixed it by accident.

(11:07:12) tetron_: I haven't seen the race condition happen 
(11:07:59) tetron_: and I haven't been able to work out a sequence that would cause it to happen
(11:10:51) tetron_: I believe the race only happens if the test also fails for some other reason and it's unable to wait for the actor to stop

Also available in: Atom PDF