Bug #11925
closed[Nodemanager] Fix unit tests
Updated by Tom Morris over 7 years ago
- Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Updated by Peter Amstutz over 7 years ago
11925-nodemanager-watchdog-test @ 97bb15198ff6071d656d461b27e1055d84826d36
Updated by Radhika Chippada over 7 years ago
The WatchdogActorTest still fails for me in my dev env, as before (this is the only nodemanager test that always fails for me locally) :
======================================================================
FAIL: test_time_timout (tests.test_failure.WatchdogActorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/tmp.ZsPoCMCUKb/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
return func(*args, **keywargs)
File "/home/rc/arvados/services/nodemanager/tests/test_failure.py", line 58, in test_time_timout
self.assertTrue(kill_mock.called)
AssertionError: False is not true
Updated by Peter Amstutz over 7 years ago
11925-nodemanager-watchdog-test @ f313294f95e55f595ace70e2a614557c0428f2da
The fix (adding an extra wait to the test) is a bit of a hack but it is the only thing I've tried that seems to work.
Updated by Lucas Di Pentima over 7 years ago
I don't know if it's related, but I'm seeing this test fail almost every run:
====================================================================== ERROR: test_arvados_node_not_cleaned_after_shutdown_cancelled (tests.test_computenode_dispatch_slurm.SLURMComputeNodeShutdownActorTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lucas/arvados_local/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "/home/lucas/arvados_local/services/nodemanager/tests/test_computenode_dispatch.py", line 244, in test_arvados_node_not_cleaned_after_shutdown_cancelled self.check_success_flag(False, 2) File "/home/lucas/arvados_local/services/nodemanager/tests/test_computenode_dispatch.py", line 200, in check_success_flag last_flag = self.shutdown_actor.success.get(self.TIMEOUT) File "/home/lucas/arvados_local/tmp/VENVDIR/local/lib/python2.7/site-packages/pykka/threading.py", line 52, in get compat.reraise(*self._data['exc_info']) File "/home/lucas/arvados_local/tmp/VENVDIR/local/lib/python2.7/site-packages/pykka/compat.py", line 12, in reraise exec('raise tp, value, tb') File "<string>", line 1, in <module> ActorDeadError: ComputeNodeShutdownActor (urn:uuid:21137c1c-a77b-4d58-afd0-749333685eba) stopped before handling the message
Updated by Peter Amstutz over 7 years ago
Some structural reasons for failing node manager tests:
- Tests were written with the assumption that certain communications between actors (threads) was synchronous, which provided some sequencing. This assumption changed in #8543 which changed the majority of messaging from synchronous to asynchronous.
- Code which relies on changing the behavior of mocks on the fly has to be carefully synchronized to ensure that it applies without racing with the code that's about to call the mock.
Updated by Peter Amstutz over 7 years ago
11925-nodemanager-watchdog-test @ 0ac98ea67157ab1a6d92b02e59b8491d90dd1f79
Fixes flaky tests in test_computenode_dispatch_slurm. (Passed 30 times in a row with no failures).
Updated by Peter Amstutz over 7 years ago
- Target version changed from 2017-08-02 sprint to 2017-08-16 sprint
Updated by Peter Amstutz over 7 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 7 years ago
- Subject changed from [Nodemanager] Fix watchdog test to [Nodemanager] Fix unit tests
====================================================================== ERROR: test_arvados_node_not_cleaned_after_shutdown_cancelled (tests.test_computenode_dispatch.ComputeNodeShutdownActorTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/src/arvados/services/nodemanager/tests/test_computenode_dispatch.py", line 245, in test_arvados_node_not_cleaned_after_shutdown_cancelled self.check_success_flag(False, 2) File "/usr/src/arvados/services/nodemanager/tests/test_computenode_dispatch.py", line 200, in check_success_flag last_flag = self.shutdown_actor.success.get(self.TIMEOUT) File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/pykka/threading.py", line 52, in get compat.reraise(*self._data['exc_info']) File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/pykka/compat.py", line 12, in reraise exec('raise tp, value, tb') File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/pykka/actor.py", line 431, in ask self.tell(message) File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/pykka/actor.py", line 398, in tell raise ActorDeadError('%s not found' % self) ActorDeadError: ComputeNodeShutdownActor (urn:uuid:2495c712-d4fe-4801-8f7d-2a17afef3d25) not found ====================================================================== ERROR: test_arvados_node_not_cleaned_after_shutdown_cancelled (tests.test_computenode_dispatch_slurm.SLURMComputeNodeShutdownActorTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "/usr/src/arvados/services/nodemanager/tests/test_computenode_dispatch.py", line 244, in test_arvados_node_not_cleaned_after_shutdown_cancelled self.shutdown_actor.ping().get(self.TIMEOUT) File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/pykka/threading.py", line 52, in get compat.reraise(*self._data['exc_info']) File "/var/lib/arvados/test/VENVDIR/local/lib/python2.7/site-packages/pykka/compat.py", line 12, in reraise exec('raise tp, value, tb') File "<string>", line 1, in <module> ActorDeadError: ComputeNodeShutdownActor (urn:uuid:04717cf8-b0f1-4def-87ed-a7ff786a83c4) stopped before handling the message
Updated by Peter Amstutz over 7 years ago
Another systemic issue:
Python mocks seem to be unreliable when shadowing builtin functions (like time.time()) with mock.patch() and being accessed across threads. They sometimes get arbitrarily reset back to their original values, despite the fact that the mock teardown shouldn't have executed yet. The solution seems to be to pass through an explicit mock function instead of relying on mock.patch().
Updated by Lucas Di Pentima over 7 years ago
Updates @ 597b742a6 LGTM.
Some tests at test_daemon.py
are failing once in a while, the rest seem to be reliable.
Updated by Tom Morris over 7 years ago
- Target version changed from 2017-08-16 sprint to 2017-08-30 Sprint
Updated by Peter Amstutz over 7 years ago
- Status changed from In Progress to Resolved