https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422016-02-16T19:16:03ZArvadosArvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=353012016-02-16T19:16:03ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Story points</strong> set to <i>1.0</i></li></ul> Arvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=353862016-02-17T19:45:14ZBrett Smithbrett.smith@curii.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/35386/diff?detail_id=34701">diff</a>)</li></ul> Arvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=354112016-02-17T20:34:29ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Assigned To</strong> set to <i>Peter Amstutz</i></li></ul> Arvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=357232016-02-24T21:19:03ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>on_failure is only called when there is no future associated with the message. As it turns out, all calls that use ActorProxy have an associated Future object, and all messaging between actors in node manager uses ActorProxy. This means unhandled exceptions are stored in a Future object to be returned to the caller. However, if the caller never calls get() on the Future object (because it never stored it), this means the exception is silently ignored.</p>
<p>These lingering future objects may also be creating circular references that is causing the memory leak.</p> Arvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=357822016-02-26T14:12:35ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Arvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=357932016-02-26T16:43:14ZNico César
<ul></ul><p>reviewing 0a1c109684c62f0bc42e7dca30319fc8222dbef7</p>
<p>there are 3 cases where this exception is raise but the test only tests <code>MemoryError</code></p>
<p>It will be cool to also have <code>threading.ThreadError</code> and <code>OSError</code> with <code>exception_value.errno == errno.ENOMEM</code> down the road this could catch other things</p>
<p>the rest LGTM</p> Arvados - Idea #8437: [Node Manager] Actors define on_failure to terminate the process on exceptions that are difficult to recoverhttps://dev.arvados.org/issues/8437?journal_id=358432016-02-29T15:35:06ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>Applied in changeset arvados|commit:6ed351ec65d657c27b48b4e4ac0c89d880a2fd1a.</p>