Project

General

Profile

Actions

Bug #9018

closed

[Node manager] exception handler should not kill parent process

Added by Tom Clegg almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Node Manager
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Story points:
-

Description

A race condition in test_fatal_error (tests.test_failure.ActorUnhandledExceptionTest) causes os.killpg() to be called after it has been unstubbed. This kills the test suite and run-tests.sh.

There are two problems here:
  • The test should not have a race condition
  • The exception handler should only kill node manager itself, not other processes.

Proposed fix for overkill

Use os._exit() or os.kill(0,9) instead of os.killpg()

Proposed fix for test race

TBD?

Actions #1

Updated by Tom Clegg almost 7 years ago

  • Description updated (diff)
  • Category set to Node Manager
Actions #2

Updated by Brett Smith almost 7 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Peter Amstutz over 6 years ago

  • Target version changed from Arvados Future Sprints to 2016-05-25 sprint
Actions #4

Updated by Peter Amstutz over 6 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:aea5300167770beb3cca6ad90e5ebb04da961416.

Actions #5

Updated by Tom Clegg over 6 years ago

The test race might still exist. However, it hasn't been seen recently, so maybe some other changes have fixed it by accident.

(11:07:12) tetron_: I haven't seen the race condition happen 
(11:07:59) tetron_: and I haven't been able to work out a sequence that would cause it to happen
(11:10:51) tetron_: I believe the race only happens if the test also fails for some other reason and it's unable to wait for the actor to stop
Actions

Also available in: Atom PDF