Project

General

Profile

Bug #8932

Updated by Peter Amstutz about 8 years ago

Currently Node manager kills itself on certain types of actor failure: 

 <pre> 
     def on_failure(self, exception_type, exception_value, tb): 
         lg = getattr(self, "_logger", logging) 
         if (exception_type in (threading.ThreadError, MemoryError) or 
             exception_type is OSError and exception_value.errno == errno.ENOMEM): 
             lg.critical("Unhandled exception is a fatal error, killing Node Manager") 
             os.killpg(os.getpgid(0), 9) 

 </pre> 

 However, experience suggests that unexpected/unhandled actor failure (which stops the actor) usually causes node manager to misbehave (at best) or wedges node manager completely (at worst).    Especially now that #8799 is merged (so node manager can recover when a shutdown actor is interrupted), I propose that node manager should kill itself on all unhandled exceptions. 

Back