Actions
Bug #8932
closed[Node manager] Always crash on_failure()
Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-
Description
Currently Node manager kills itself on certain types of actor failure:
def on_failure(self, exception_type, exception_value, tb): lg = getattr(self, "_logger", logging) if (exception_type in (threading.ThreadError, MemoryError) or exception_type is OSError and exception_value.errno == errno.ENOMEM): lg.critical("Unhandled exception is a fatal error, killing Node Manager") os.killpg(os.getpgid(0), 9)
However, experience suggests that unexpected/unhandled actor failure (which stops the actor) usually causes node manager to misbehave (at best) or wedges node manager completely (at worst). Especially now that #8799 is merged (so node manager can recover when a shutdown actor is interrupted), I propose that node manager should kill itself on all unhandled exceptions.
Actions