Bug #9986
closed[FUSE] [Testing] arv-mount tests deadlock frequently
Added by Tom Clegg over 8 years ago. Updated over 8 years ago.
Updated by Tom Clegg over 8 years ago
It seems that when the llfuse thread doesn't end (which we notice and log), the next test case will/might deadlock.
Updated by Tom Clegg over 8 years ago
- avoids calling multiprocessing.Pool.terminate()
- abandons the entire test suite with "kill -9 self" (rather than risk deadlock) when the llfuse thread doesn't join in a reasonable time after a test
- uses one shared
multiprocessing.Pool(1, maxtasksperchild=1)
for all integration tests instead of creating a new pool for each test case - sometimes prints a bunch of harmless stderr1 when exiting tests, seemingly due to a multiprocessing shutdown race (this is a bit ugly, but preferable to deadlock)
1
Exception TypeError: TypeError("'NoneType' object does not support item deletion",) in <Finalize object, dead> ignored
Updated by Lucas Di Pentima over 8 years ago
Some questions/observations:
- File
mount_test_base.py
:- Why adding code to wait for additional 10 secs instead of just waiting up to 11 secs from the beginning? (line 76)
- If additional 10 secs wait is needed separated from the initial 1 sec: wouldn't be more correct to put the
is_alive()
check on the same block as thejoin()
call?
- File
integration_test.py
:- Is the conditional
if m.llfuse_thread.is_alive()
ever reached by being after the return statement on the context manager block? Maybe I’m missing something about the decorators behaviour. (line 81)
- Is the conditional
Updated by Tom Clegg over 8 years ago
Lucas Di Pentima wrote:
- Why adding code to wait for additional 10 secs instead of just waiting up to 11 secs from the beginning? (line 76)
Mainly to expose whether it ever takes >1s to join (as opposed to "always either joins fast, or deadlocks"). But doing one join() and printing the actual time taken would make more sense. Updated.
- Is the conditional
if m.llfuse_thread.is_alive()
ever reached by being after the return statement on the context manager block? Maybe I’m missing something about the decorators behaviour. (line 81)
Oops, you're right. Moved this to a "finally". Also fixed the IntegrationTest __enter__
so the "with ... as m" construct actually works as intended.
→ 4cb8583
Updated by Tom Clegg over 8 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:503c686bc80825d00980a970af69ec60f9e6ce9b.
Updated by Tom Clegg over 8 years ago
- Status changed from Resolved to In Progress
- Assigned To set to Tom Clegg
Reopening to do "retry after deadlock+sigkill".
Updated by Lucas Di Pentima over 8 years ago
Looks good to me. If this is permanent, maybe the function should be called "do_test_once_sometimes()" :)
Updated by Tom Clegg over 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:3bf898db1a6f0db043060cd601131b17bd6ef82d.