Project

General

Profile

Actions

Bug #8932

closed

[Node manager] Always crash on_failure()

Added by Peter Amstutz about 8 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

Currently Node manager kills itself on certain types of actor failure:

    def on_failure(self, exception_type, exception_value, tb):
        lg = getattr(self, "_logger", logging)
        if (exception_type in (threading.ThreadError, MemoryError) or
            exception_type is OSError and exception_value.errno == errno.ENOMEM):
            lg.critical("Unhandled exception is a fatal error, killing Node Manager")
            os.killpg(os.getpgid(0), 9)

However, experience suggests that unexpected/unhandled actor failure (which stops the actor) usually causes node manager to misbehave (at best) or wedges node manager completely (at worst). Especially now that #8799 is merged (so node manager can recover when a shutdown actor is interrupted), I propose that node manager should kill itself on all unhandled exceptions.

Actions

Also available in: Atom PDF