Project

General

Profile

Actions

Bug #22056

closed

FUSE test is hanging

Added by Peter Amstutz 6 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assigned To:
Category:
-
Story points:
-
Actions #1

Updated by Peter Amstutz 6 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 6 months ago

FUSE tests have been failing and hanging:

developer-run-tests-doc-pysdk-api-fuse: #348 /console

Actions #3

Updated by Peter Amstutz 6 months ago

  • Assigned To set to Peter Amstutz
Actions #4

Updated by Peter Amstutz 6 months ago

I had FUSE fail locally inside arvbox with the following errors, but it doesn't happen reliably.

Traceback (most recent call last):
  File "/var/lib/arvados-arvbox/test/VENV3DIR/lib/python3.9/site-packages/arvados/_internal/diskcache.py", line 177, in get_from_disk
    content = mmap.mmap(filehandle.fileno(), 0, access=mmap.ACCESS_READ)
OSError: [Errno 12] Cannot allocate memory
2024-08-21 19:35:50 arvados.arv-mount[3835] ERROR: exception during setup: can't start new thread
Traceback (most recent call last):
  File "/usr/src/arvados/services/fuse/arvados_fuse/command.py", line 401, in __init__
    self._setup_mount()
  File "/usr/src/arvados/services/fuse/arvados_fuse/command.py", line 508, in _setup_mount
    self.operations = Operations(
  File "/usr/src/arvados/services/fuse/arvados_fuse/__init__.py", line 607, in __init__
    self.inodes = Inodes(inode_cache, encoding=encoding, fsns=fsns,
  File "/usr/src/arvados/services/fuse/arvados_fuse/__init__.py", line 300, in __init__
    self._inode_remove_thread.start()
  File "/usr/lib/python3.9/threading.py", line 874, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

The failed runs are not producing error logs because they hang and have to be aborted. So we need to tweak the test runner to print out errors immediately.

Actions #5

Updated by Brett Smith 6 months ago

Peter Amstutz wrote in #note-4:

The failed runs are not producing error logs because they hang and have to be aborted. So we need to tweak the test runner to print out errors immediately.

Two things you can do, both in pytest.ini. One, use the --capture option:

addopts = --capture=tee-sys

For the record: this is great for debugging, but I would personally prefer we not merge this change into main. Our tests log a ton of noise, and that also creates developer overhead when dealing with regular test failures. Of course this isn't my call alone, we can talk about it as a team, just putting it out there.

Another thing you can add that might also help and would be fine to merge to main is to set faulthandler_timeout:

faulthandler_timeout = SECS
Actions #6

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-08-28 sprint to Development 2024-09-11 sprint
Actions #7

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-09-11 sprint to Development 2024-09-25 sprint
Actions #8

Updated by Peter Amstutz 4 months ago

  • Status changed from In Progress to Closed

We haven't been able to reproduce this, and it hasn't happened recently, so I don't think there is anything actionable to to right now. Closing.

Actions

Also available in: Atom PDF