Bug #17983
closed[arv-mount] 2.2.1 deadlock when --read-write and --collection is specified
Description
ward@shell:~$ arv-mount --debug --foreground --read-write --collection pirca-4zz18-nsaus8rzkfeh69m keep/ 2021-08-06 14:54:54 arvados.arv-mount[1224] DEBUG: arv-mount debugging enabled 2021-08-06 14:54:54 arvados.arv-mount[1224] INFO: /usr/bin/arv-mount 2.2.1 started 2021-08-06 14:54:54 arvados.arv-mount[1224] INFO: enable write is True 2021-08-06 14:54:54 arvados.arvados_fuse[1224] DEBUG: InodeCache touched inode 1 (size 0) (uuid pirca-4zz18-nsaus8rzkfeh69m) total now 0 (1 entries) FUSE library version: 2.9.9 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56, pid: 0 INIT: 7.27 flags=0x003ffffb max_readahead=0x00020000 INIT: 7.19 flags=0x00000021 max_readahead=0x00020000 max_write=0x00020000 max_background=0 congestion_threshold=0 unique: 1, success, outsize: 40 unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 1299 unique: 2, success, outsize: 120 unique: 3, opcode: STATFS (17), nodeid: 1, insize: 40, pid: 1299 unique: 3, success, outsize: 96 unique: 4, opcode: OPENDIR (27), nodeid: 1, insize: 48, pid: 1311 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: arv-mount opendir: inode 1 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: InodeCache touched inode 1 (size 0) (uuid pirca-4zz18-nsaus8rzkfeh69m) total now 0 (1 entries) 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: Updating collection pirca-4zz18-nsaus8rzkfeh69m inode 1 to record version None 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: <arvados_fuse.fusedir.CollectionDirectory object at 0x7fb99c7c2558> manifest_size 116 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: InodeCache touched inode 1 (size 14848) (uuid pirca-4zz18-nsaus8rzkfeh69m) total now 14848 (1 entries) unique: 4, success, outsize: 32 unique: 5, opcode: READDIR (28), nodeid: 1, insize: 80, pid: 1311 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: arv-mount readdir: fh 0 off 0 unique: 5, success, outsize: 120 unique: 6, opcode: LOOKUP (1), nodeid: 1, insize: 55, pid: 1311 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: InodeCache touched inode 1 (size 14848) (uuid pirca-4zz18-nsaus8rzkfeh69m) total now 14848 (1 entries) 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: arv-mount lookup: parent_inode 1 name 'stderr (1).txt' inode 2 unique: 6, success, outsize: 144 unique: 7, opcode: READDIR (28), nodeid: 1, insize: 80, pid: 1311 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: arv-mount readdir: fh 0 off 3 unique: 7, success, outsize: 16 unique: 8, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 0 2021-08-06 14:56:08 arvados.arvados_fuse[1224] DEBUG: arv-mount release fh 0 unique: 8, success, outsize: 16 unique: 9, opcode: LOOKUP (1), nodeid: 1, insize: 49, pid: 1313 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: InodeCache touched inode 1 (size 14848) (uuid pirca-4zz18-nsaus8rzkfeh69m) total now 14848 (1 entries) 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: arv-mount lookup: parent_inode 1 name 'testfile' not found unique: 9, error: -2 (No such file or directory), outsize: 16 unique: 10, opcode: CREATE (35), nodeid: 1, insize: 65, pid: 1313 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: arv-mount create: parent_inode 1 'testfile' 100644 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: collection notify add <arvados.collection.Collection object at 0x7fb99c6a4b70> testfile <arvados.arvfile.ArvadosFile object at 0x7fb99c82b048> 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: collection notify mod <arvados.collection.Collection object at 0x7fb99c6a4b70> testfile (<arvados.arvfile.ArvadosFile object at 0x7fb99c82b048>, <arvados.arvfile.ArvadosFile object at 0x7fb99c82b048>) 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: InodeCache touched inode 1 (size 14848) (uuid pirca-4zz18-nsaus8rzkfeh69m) total now 14848 (1 entries) unique: 10, success, outsize: 160 unique: 11, opcode: FLUSH (25), nodeid: 3, insize: 64, pid: 1313 unique: 11, success, outsize: 16 unique: 12, opcode: GETXATTR (22), nodeid: 3, insize: 68, pid: 1313 unique: 12, error: -38 (Function not implemented), outsize: 16 unique: 13, opcode: WRITE (16), nodeid: 3, insize: 112, pid: 1313 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: arv-mount write 1 0 32 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: collection notify write <arvados.collection.Collection object at 0x7fb99c6a4b70> testfile (<arvados.arvfile.ArvadosFile object at 0x7fb99c82b048>, <arvados.arvfile.ArvadosFile object at 0x7fb99c82b048>) unique: 13, success, outsize: 24 unique: 14, opcode: FLUSH (25), nodeid: 3, insize: 64, pid: 1313 2021-08-06 14:56:21 arvados.arvados_fuse[1224] DEBUG: collection notify mod <arvados.collection.Collection object at 0x7fb99c6a4b70> testfile (<arvados.arvfile.ArvadosFile object at 0x7fb99c82b048>, <arvados.arvfile.ArvadosFile object at 0x7fb99c82b048>) 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: {'pirca-bi6l4-1cd520a55b71d5a7': {'href': '/keep_services/pirca-bi6l4-1cd520a55b71d5a7', 'kind': 'arvados#keepService', 'etag': '2k0rcryua3dxrzkrar3m1pvyc', 'uuid': 'pirca-bi6l4-1cd520a55b71d5a7', 'owner_uuid': 'pirca-tpzed-000000000000000', 'created_at': '2021-08-02T14:54:54.574032000Z', 'modified_by_client_uuid': None, 'modified_by_user_uuid': 'pirca-tpzed-000000000000000', 'modified_at': '2021-08-02T14:54:54.574032000Z', 'service_host': 'keep0.pirca.arvadosapi.com', 'service_port': 25107, 'service_ssl_flag': False, 'service_type': 'disk', 'read_only': False, '_service_root': 'http://keep0.pirca.arvadosapi.com:25107/'}, 'pirca-bi6l4-629198f41fae2d7c': {'href': '/keep_services/pirca-bi6l4-629198f41fae2d7c', 'kind': 'arvados#keepService', 'etag': '97c5ja2or0op55az5cx3dxaui', 'uuid': 'pirca-bi6l4-629198f41fae2d7c', 'owner_uuid': 'pirca-tpzed-000000000000000', 'created_at': '2021-08-02T14:54:54.574032000Z', 'modified_by_client_uuid': None, 'modified_by_user_uuid': 'pirca-tpzed-000000000000000', 'modified_at': '2021-08-02T14:54:54.574032000Z', 'service_host': 'keep1.pirca.arvadosapi.com', 'service_port': 25107, 'service_ssl_flag': False, 'service_type': 'disk', 'read_only': False, '_service_root': 'http://keep1.pirca.arvadosapi.com:25107/'}} 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: 0837c95a7eebd0844371cda55d082228+32: ['http://keep1.pirca.arvadosapi.com:25107/', 'http://keep0.pirca.arvadosapi.com:25107/'] 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: Pool max threads is 2 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: Request: PUT http://keep1.pirca.arvadosapi.com:25107/0837c95a7eebd0844371cda55d082228 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: Request: PUT http://keep0.pirca.arvadosapi.com:25107/0837c95a7eebd0844371cda55d082228 2021-08-06 14:56:21 arvados.keep[1224] INFO: PUT 200: 32 bytes in 147.55630493164062 msec (0.000 MiB/sec) 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: KeepWriterThread <KeepWriterThread(Thread-37, started daemon 140433837463296)> succeeded 0837c95a7eebd0844371cda55d082228+32 http://keep0.pirca.arvadosapi.com:25107/ 2021-08-06 14:56:21 arvados.keep[1224] INFO: PUT 200: 32 bytes in 170.5644130706787 msec (0.000 MiB/sec) 2021-08-06 14:56:21 arvados.keep[1224] DEBUG: KeepWriterThread <KeepWriterThread(Thread-36, started daemon 140433845856000)> succeeded 0837c95a7eebd0844371cda55d082228+32 http://keep1.pirca.arvadosapi.com:25107/
The above backtrace was caused by (in a separate shell):
ward@shell:~$ date > keep/testfile
which just hangs.
On latest main with collection versioning disabled, the problem is identical (tested on tordo).
When the `--disable-event-listening` flag is added to `arv-mount`, the deadlock does not happen.
On latest main with collection versioning enabled (e.g. ce8i5), the deadlock is partially masked due to bug #18005.
Updated by Ward Vandewege over 3 years ago
- Description updated (diff)
- Subject changed from [arv-mount] deadlock when --read-write and --collection is specified to [arv-mount] 2.2.1 deadlock when --read-write and --collection is specified
Updated by Ward Vandewege over 3 years ago
- Target version set to 2021-08-18 sprint
Updated by Peter Amstutz over 3 years ago
Ward Vandewege wrote:
I am seeing the same "undefined method `each_line` for nil:NilClass on ce8i5 when trying to add a new property to a collection without any properties (I used wb2):
[...]
Huh, weird. Can you dig out a more thorough stack trace?
Updated by Peter Amstutz over 3 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
2021-08-18 sprint)
I haven't totally pinned this down but in general terms here's what I think is happening:
- In one thread, it flushes the file.
- This releases the llfuse lock and calls collection flush.
- the collection lock is acquired
- collection flush makes a callback to notify the FUSE object that it changed
- the callback tries to re-acquire the llfuse lock
- the thread blocks on the llfuse lock
Meanwhile, in another thread
- It has the llfuse lock
- it tries to get the collection lock
- the collection lock is held by the other thread
- the thread blocks on the collection lock
The problem is fusedir.py:304
it calls llfuse.lock
It must not do that without releasing the collection lock first.
Updated by Peter Amstutz over 3 years ago
- Target version set to 2021-08-18 sprint
Updated by Peter Amstutz over 3 years ago
17983-fuse-unlock-relock @ 87b0f117ff1ce917d7301f8a6304e931a189e48d
- Unlock the collection before trying to acquire llfuse.lock, then re-acquire the lock, so that locking order (llfuse, then collection) is correct
This seems to fix the original deadlock, and I did some addition manual testing and the writable FUSE mount seems quite solid (in a way that it has never been before).
Updated by Lucas Di Pentima over 3 years ago
I've tested it manually and worked for me too. Indeed the solution looks a bit nasty but I can confirm it seems really stable.
LGTM.
Updated by Peter Amstutz over 3 years ago
- Status changed from In Progress to Resolved