Bug #22056
closedFUSE test is hanging
Updated by Peter Amstutz 6 months ago
FUSE tests have been failing and hanging:
Updated by Peter Amstutz 6 months ago
I had FUSE fail locally inside arvbox with the following errors, but it doesn't happen reliably.
Traceback (most recent call last): File "/var/lib/arvados-arvbox/test/VENV3DIR/lib/python3.9/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ) OSError: [Errno 12] Cannot allocate memory
2024-08-21 19:35:50 arvados.arv-mount[3835] ERROR: exception during setup: can't start new thread Traceback (most recent call last): File "/usr/src/arvados/services/fuse/arvados_fuse/command.py", line 401, in __init__ self._setup_mount() File "/usr/src/arvados/services/fuse/arvados_fuse/command.py", line 508, in _setup_mount self.operations = Operations( File "/usr/src/arvados/services/fuse/arvados_fuse/__init__.py", line 607, in __init__ self.inodes = Inodes(inode_cache, encoding=encoding, fsns=fsns, File "/usr/src/arvados/services/fuse/arvados_fuse/__init__.py", line 300, in __init__ self._inode_remove_thread.start() File "/usr/lib/python3.9/threading.py", line 874, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread
The failed runs are not producing error logs because they hang and have to be aborted. So we need to tweak the test runner to print out errors immediately.
Updated by Brett Smith 6 months ago
Peter Amstutz wrote in #note-4:
The failed runs are not producing error logs because they hang and have to be aborted. So we need to tweak the test runner to print out errors immediately.
Two things you can do, both in pytest.ini
. One, use the --capture
option:
addopts = --capture=tee-sys
For the record: this is great for debugging, but I would personally prefer we not merge this change into main. Our tests log a ton of noise, and that also creates developer overhead when dealing with regular test failures. Of course this isn't my call alone, we can talk about it as a team, just putting it out there.
Another thing you can add that might also help and would be fine to merge to main is to set faulthandler_timeout
:
faulthandler_timeout = SECS
Updated by Peter Amstutz 5 months ago
- Target version changed from Development 2024-08-28 sprint to Development 2024-09-11 sprint
Updated by Peter Amstutz 5 months ago
- Target version changed from Development 2024-09-11 sprint to Development 2024-09-25 sprint
Updated by Peter Amstutz 4 months ago
- Status changed from In Progress to Closed
We haven't been able to reproduce this, and it hasn't happened recently, so I don't think there is anything actionable to to right now. Closing.