Project

General

Profile

Actions

Bug #6706

closed

[FUSE] Crash in forget()

Added by Peter Amstutz almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Story points:
0.5

Description

I think this is getting called unexpectedly during shutdown, and self.inodes == None:

2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr 2015-07-21 23:55:00 arvados.arvados_fuse[9793] ERROR: Unhandled exception during FUSE operation
2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr Traceback (most recent call last):
2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados_fuse/__init__.py", line 254, in catch_exceptions_wrapper
2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr     return orig_func(self, *args, **kwargs)
2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr   File "/usr/local/lib/python2.7/dist-packages/arvados_fuse/__init__.py", line 443, in forget
2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr     ent = self.inodes[inode]
2015-07-21_23:55:01 su92l-8i9sb-m4m1girz122q3f1 4686 5 stderr TypeError: 'NoneType' object has no attribute '__getitem__'

Subtasks 2 (0 open2 closed)

Task #6731: Review 6706-fuse-forget-crashResolvedBrett Smith07/23/2015Actions
Task #6730: FixResolvedPeter Amstutz07/23/2015Actions
Actions #1

Updated by Peter Amstutz almost 9 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz almost 9 years ago

  • Story points set to 0.5
Actions #3

Updated by Brett Smith almost 9 years ago

  • Category set to FUSE
  • Assigned To set to Peter Amstutz
  • Target version changed from Bug Triage to 2015-08-05 sprint
Actions #4

Updated by Brett Smith almost 9 years ago

Reviewing 87ff392

This definitely does the job, and obviously it's straightforward. I wonder if we might be better off avoiding setting self.inodes = None, though? Going over the llfuse docs, it doesn't look like it would be required, and avoiding it would help prevent future surprises like this. No other methods are prepared to cope with self.inodes = None, though I agree it's hard to imagine why they would ever be called after destroy().

Just an idea, though. If it's an insufficient fix or inappropriate change, you can definitely merge this. Thanks.

Actions #5

Updated by Peter Amstutz almost 9 years ago

While fixing this, I noticed another problem:

Exception in thread WebSocketClient (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
  File "/usr/lib/python2.7/threading.py", line 763, in run
  File "build/bdist.linux-x86_64/egg/ws4py/websocket.py", line 427, in run
  File "build/bdist.linux-x86_64/egg/ws4py/websocket.py", line 305, in once
  File "build/bdist.linux-x86_64/egg/ws4py/websocket.py", line 357, in process
  File "build/bdist.linux-x86_64/egg/ws4py/streaming.py", line 185, in receiver
  File "build/bdist.linux-x86_64/egg/ws4py/framing.py", line 138, in _parsing
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

Pulling on this thread (hah), it turns out that the correct way to shut down the websockets client is to use terminate(), not close() (from reading the code, it turns out the close() method only sends a shutdown message to the server and does not actually close the socket or stop the client thread).

So as a bonus bugfix I am fixing up all the places in the Arvados tree that use Python websockets client to use terminate() instead of close().

Actions #6

Updated by Peter Amstutz almost 9 years ago

Brett Smith wrote:

Reviewing 87ff392

This definitely does the job, and obviously it's straightforward. I wonder if we might be better off avoiding setting self.inodes = None, though? Going over the llfuse docs, it doesn't look like it would be required, and avoiding it would help prevent future surprises like this. No other methods are prepared to cope with self.inodes = None, though I agree it's hard to imagine why they would ever be called after destroy().

Just an idea, though. If it's an insufficient fix or inappropriate change, you can definitely merge this. Thanks.

Introduced a clear() method which wipes out the inode/inode cache entries dicts instead of setting to None. Thanks.

Actions #7

Updated by Peter Amstutz almost 9 years ago

Peter Amstutz wrote:

So as a bonus bugfix I am fixing up all the places in the Arvados tree that use Python websockets client to use terminate() instead of close().

So, whoops, this turns out to be wrong. terminate() seems to be intended as an internal method, but not documented as such. The correct way to shut down a connection is to call close() and then wait for the closed() callback.

EventClient now overrides close() to provide the desired behavior in way that's compatible with the existing code using events.py.

Actions #8

Updated by Peter Amstutz almost 9 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Brett Smith almost 9 years ago

17ce65c is good to merge. Because we're changing the semantics of EventClient.close() from its superclass, a docstring that explains that might help avoid surprises for future users. That's the only other idea I had. Thanks.

Actions #10

Updated by Peter Amstutz over 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:3e6ba5fa5f225c8aa431ce9a2796369c1e1dda2d.

Actions

Also available in: Atom PDF