Project

General

Profile

Actions

Bug #21568

closed

arv-mount double free or corruption with many concurrent accesses

Added by Brett Smith about 2 months ago. Updated 25 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Story points:
-
Release relationship:
Auto

Description

Steps to reproduce:

  • Write Arvados settings.conf pointed at pirca with an admin token.
  • arv-mount --foreground --shared --directory-cache=SIZE MOUNT_PATH - I have been able to reproduce with sizes as low as 2MiB and high as 1GiB, I suspect it doesn't matter
  • Start a ~simultaneous ls -lR MOUNT_PATH/SUBDIR process for each SUBDIR under MOUNT_PATH - as I write this, I see 179

Reproduction script attached. I was trying to reproduce #21541 but instead arv-mount got SIGSEGV:

Mar 01 16:14:23 arv-mount[1131697]: double free or corruption (out)
Mar 01 16:14:23 systemd[2070]: arv-mount-stress-_hkr5ofm.service: Main process exited, code=killed, status=6/ABRT
Mar 01 16:14:23 systemd[2070]: arv-mount-stress-_hkr5ofm.service: Failed with result 'signal'.

Not attaching logs because they contain a lot of private data.


Files

arv-mount-stress-test.py (4.09 KB) arv-mount-stress-test.py Version 1 Brett Smith, 03/01/2024 09:35 PM
arv-mount-stress-test.py (5.44 KB) arv-mount-stress-test.py Version 2 Brett Smith, 03/01/2024 11:57 PM

Related issues

Related to Arvados - Bug #21541: arv-mount KeyError during cap_cache - Seemingly lost track of parent inodeResolvedPeter AmstutzActions
Actions #2

Updated by Peter Amstutz about 2 months ago

  • Target version set to Development 2024-03-13 sprint
Actions #3

Updated by Brett Smith about 2 months ago

  • Description updated (diff)
Actions #4

Updated by Brett Smith about 2 months ago

  • Related to Bug #21541: arv-mount KeyError during cap_cache - Seemingly lost track of parent inode added
Actions #5

Updated by Brett Smith about 2 months ago

It is possible to reproduce this with eight concurrent accesses. See the attached version 2 of the script. The client gets either ENOTCONN or ECONNABORT.

Actions #6

Updated by Peter Amstutz about 2 months ago

Here is another fun one

Mar 05 10:24:34 curiipeter python3[162640]: waiting for mount
Mar 05 10:24:38 curiipeter python3[162640]: starting stress test #1
Mar 05 10:24:42 curiipeter arv-mount[162642]: free(): corrupted unsorted chunks
Actions #7

Updated by Peter Amstutz about 2 months ago

  • Assigned To set to Peter Amstutz
Actions #8

Updated by Peter Amstutz about 2 months ago

#0  _int_malloc (av=av@entry=0x7f6010000030, bytes=bytes@entry=16584) at ./malloc/malloc.c:4004
#1  0x00007f6097f85809 in __GI___libc_malloc (bytes=16584) at ./malloc/malloc.c:3323
#2  0x00007f6096d3ba40 in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#3  0x00007f6096d3909f in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#4  0x00007f6096d10eaa in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#5  0x00007f6096d4bff5 in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#6  0x00007f6096d3ce2d in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#7  0x00007f6096d3ab70 in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#8  0x00007f6096d0fb2e in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#9  0x00007f6096d1b7e9 in SSL_read_ex () from /lib/x86_64-linux-gnu/libssl.so.3
#10 0x00007f6096dac25d in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#11 0x00000000005230d0 in ?? ()
#12 0x000000000053ac2c in PyObject_Vectorcall ()
#13 0x000000000052b940 in _PyEval_EvalFrameDefault ()
#14 0x000000000055c931 in _PyFunction_Vectorcall ()
#15 0x0000000000522586 in PyObject_VectorcallMethod ()
#16 0x0000000000653120 in ?? ()
#17 0x00000000006532b2 in ?? ()
#18 0x000000000067693f in ?? ()
#19 0x00000000006767ec in ?? ()
#20 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#21 0x00000000005855a4 in ?? ()
#22 0x0000000000585174 in ?? ()
#23 0x000000000056a381 in PyObject_Call ()
#24 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#25 0x00000000005855a4 in ?? ()
#26 0x0000000000585174 in ?? ()
#27 0x000000000056a381 in PyObject_Call ()
#28 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#29 0x000000000055c931 in _PyFunction_Vectorcall ()
#30 0x000000000056a381 in PyObject_Call ()
#31 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#32 0x000000000055c931 in _PyFunction_Vectorcall ()
#33 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#34 0x000000000055c931 in _PyFunction_Vectorcall ()
#35 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#36 0x000000000055c931 in _PyFunction_Vectorcall ()
#37 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#38 0x000000000055c931 in _PyFunction_Vectorcall ()
#39 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#40 0x000000000055c931 in _PyFunction_Vectorcall ()
#41 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#42 0x000000000055c931 in _PyFunction_Vectorcall ()
#43 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#44 0x000000000055c931 in _PyFunction_Vectorcall ()
#45 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f6010128c50, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f6031ffab90) at src/llfuse.c:22149
#46 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#47 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#48 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f60802fa8e0, __pyx_v_size=135168) at src/llfuse.c:44915
#49 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f60800016a8) at src/llfuse.c:45067
#50 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#0  __GI___libc_read (nbytes=5, buf=0x7f6090015113, fd=262) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=262, buf=0x7f6090015113, nbytes=5) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f609691299d in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#3  0x00007f6096909967 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#4  0x00007f6096907f86 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#5  0x00007f60969086f3 in BIO_read () from /lib/x86_64-linux-gnu/libcrypto.so.3
#6  0x00007f6096d38a70 in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#7  0x00007f6096d3bf5d in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#8  0x00007f6096d3ab70 in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#9  0x00007f6096d0fb2e in ?? () from /lib/x86_64-linux-gnu/libssl.so.3
#10 0x00007f6096d1b7e9 in SSL_read_ex () from /lib/x86_64-linux-gnu/libssl.so.3
#11 0x00007f6096dac25d in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#12 0x00000000005230d0 in ?? ()
#13 0x000000000053ac2c in PyObject_Vectorcall ()
#14 0x000000000052b940 in _PyEval_EvalFrameDefault ()
#15 0x00000000005855a4 in ?? ()
#16 0x0000000000585148 in ?? ()
#17 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#18 0x00000000005855a4 in ?? ()
#19 0x0000000000585148 in ?? ()
#20 0x000000000067917c in ?? ()
#21 0x0000000000653cb4 in ?? ()
#22 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#23 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f607cff71c8, nfds=1, timeout=300000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000055c931 in _PyFunction_Vectorcall ()
#28 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#29 0x000000000055c931 in _PyFunction_Vectorcall ()
#30 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#31 0x000000000055c931 in _PyFunction_Vectorcall ()
#32 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#33 0x000000000055c931 in _PyFunction_Vectorcall ()
#34 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#35 0x000000000055c931 in _PyFunction_Vectorcall ()
#36 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#37 0x000000000055c931 in _PyFunction_Vectorcall ()
#38 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f604c11f980, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f607cff8b90)
    at src/llfuse.c:22149
#39 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#40 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#41 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f608014c630, __pyx_v_size=135168) at src/llfuse.c:44915
#42 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f60800014a0) at src/llfuse.c:45067
#43 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#44 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f60537fc1c8, nfds=1, timeout=299931) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000055c931 in _PyFunction_Vectorcall ()
#28 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#29 0x000000000055c931 in _PyFunction_Vectorcall ()
#30 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#31 0x000000000055c931 in _PyFunction_Vectorcall ()
#32 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#33 0x000000000055c931 in _PyFunction_Vectorcall ()
#34 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#35 0x000000000055c931 in _PyFunction_Vectorcall ()
#36 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#37 0x000000000055c931 in _PyFunction_Vectorcall ()
#38 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f6048120320, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f60537fdb90)
    at src/llfuse.c:22149
#39 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#40 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#41 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f608018e910, __pyx_v_size=135168) at src/llfuse.c:44915
#42 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f60800014f0) at src/llfuse.c:45067
#43 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#44 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f60537fc1c8, nfds=1, timeout=299931) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000055c931 in _PyFunction_Vectorcall ()
#28 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#29 0x000000000055c931 in _PyFunction_Vectorcall ()
#30 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#31 0x000000000055c931 in _PyFunction_Vectorcall ()
#32 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#33 0x000000000055c931 in _PyFunction_Vectorcall ()
#34 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#35 0x000000000055c931 in _PyFunction_Vectorcall ()
#36 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#37 0x000000000055c931 in _PyFunction_Vectorcall ()
#38 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f6048120320, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f60537fdb90)
    at src/llfuse.c:22149
#39 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#40 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#41 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f608018e910, __pyx_v_size=135168) at src/llfuse.c:44915
#42 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f60800014f0) at src/llfuse.c:45067
#43 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#44 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 19
[Switching to thread 19 (Thread 0x7f6052ffd6c0 (LWP 233004))]
#0  __GI___libc_read (nbytes=135168, buf=0x7f60801afa80, fd=261) at ../sysdeps/unix/sysv/linux/read.c:26
26    ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
(gdb) bt
#0  __GI___libc_read (nbytes=135168, buf=0x7f60801afa80, fd=261) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __GI___libc_read (fd=261, buf=0x7f60801afa80, nbytes=135168) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f6097374b01 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#3  0x00007f609737651c in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#4  0x00007f6096efd85a in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f60801afa80, __pyx_v_size=135168) at src/llfuse.c:44766
#5  0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f6080001518) at src/llfuse.c:45067
#6  0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7  0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f6033ffcea8, nfds=1, timeout=300000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000055c931 in _PyFunction_Vectorcall ()
#28 0x000000000051f186 in _PyObject_FastCallDictTstate ()
#29 0x0000000000565973 in ?? ()
#30 0x0000000000517e55 in _PyObject_MakeTpCall ()
#31 0x000000000052b940 in _PyEval_EvalFrameDefault ()
#32 0x000000000055c931 in _PyFunction_Vectorcall ()
#33 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#34 0x000000000055c931 in _PyFunction_Vectorcall ()
#35 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#36 0x000000000055c931 in _PyFunction_Vectorcall ()
#37 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#38 0x000000000055c931 in _PyFunction_Vectorcall ()
#39 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#40 0x000000000055c931 in _PyFunction_Vectorcall ()
#41 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#42 0x000000000055c931 in _PyFunction_Vectorcall ()
#43 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f601c134470, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f6033ffeb90)
    at src/llfuse.c:22149
#44 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#45 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#46 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f6080276320, __pyx_v_size=135168) at src/llfuse.c:44915
#47 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f6080001608) at src/llfuse.c:45067
#48 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#49 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f60337fbe78, nfds=1, timeout=300000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000058153f in ?? ()
#28 0x000000000052b732 in _PyEval_EvalFrameDefault ()
#29 0x000000000058153f in ?? ()
#30 0x000000000057fd65 in ?? ()
#31 0x000000000052e7aa in _PyEval_EvalFrameDefault ()
#32 0x000000000055c931 in _PyFunction_Vectorcall ()
#33 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#34 0x000000000055c931 in _PyFunction_Vectorcall ()
#35 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#36 0x000000000055c931 in _PyFunction_Vectorcall ()
#37 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#38 0x000000000055c931 in _PyFunction_Vectorcall ()
#39 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#40 0x000000000055c931 in _PyFunction_Vectorcall ()
#41 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#42 0x000000000055c931 in _PyFunction_Vectorcall ()
#43 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f6014001b50, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f60337fdb90)
    at src/llfuse.c:22149
#44 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#45 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#46 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f6080297490, __pyx_v_size=135168) at src/llfuse.c:44915
#47 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f6080001630) at src/llfuse.c:45067
#48 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#49 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f60327f9ea8, nfds=1, timeout=300000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000055c931 in _PyFunction_Vectorcall ()
#28 0x000000000051f186 in _PyObject_FastCallDictTstate ()
#29 0x0000000000565973 in ?? ()
#30 0x0000000000517e55 in _PyObject_MakeTpCall ()
#31 0x000000000052b940 in _PyEval_EvalFrameDefault ()
#32 0x000000000055c931 in _PyFunction_Vectorcall ()
#33 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#34 0x000000000055c931 in _PyFunction_Vectorcall ()
#35 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#36 0x000000000055c931 in _PyFunction_Vectorcall ()
#37 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#38 0x000000000055c931 in _PyFunction_Vectorcall ()
#39 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#40 0x000000000055c931 in _PyFunction_Vectorcall ()
#41 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#42 0x000000000055c931 in _PyFunction_Vectorcall ()
#43 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f600c1345f0, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f60327fbb90)
    at src/llfuse.c:22149
#44 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#45 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#46 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f60802d9770, __pyx_v_size=135168) at src/llfuse.c:44915
#47 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f6080001680) at src/llfuse.c:45067
#48 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#49 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0  0x00007f6097fe905f in __GI___poll (fds=0x7f602affaea8, nfds=1, timeout=300000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f6096da4c63 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#2  0x00007f6096da4cd5 in ?? () from /usr/lib/python3.11/lib-dynload/_ssl.cpython-311-x86_64-linux-gnu.so
#3  0x00000000005230d0 in ?? ()
#4  0x000000000053ac2c in PyObject_Vectorcall ()
#5  0x000000000052b940 in _PyEval_EvalFrameDefault ()
#6  0x000000000055c931 in _PyFunction_Vectorcall ()
#7  0x0000000000522586 in PyObject_VectorcallMethod ()
#8  0x0000000000653120 in ?? ()
#9  0x00000000006532b2 in ?? ()
#10 0x000000000067693f in ?? ()
#11 0x00000000006767ec in ?? ()
#12 0x000000000052c556 in _PyEval_EvalFrameDefault ()
#13 0x00000000005855a4 in ?? ()
#14 0x0000000000585174 in ?? ()
#15 0x000000000056a381 in PyObject_Call ()
#16 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#17 0x00000000005855a4 in ?? ()
#18 0x0000000000585174 in ?? ()
#19 0x000000000056a381 in PyObject_Call ()
#20 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#21 0x000000000055c931 in _PyFunction_Vectorcall ()
#22 0x000000000056a381 in PyObject_Call ()
#23 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#24 0x000000000055c931 in _PyFunction_Vectorcall ()
#25 0x000000000056a381 in PyObject_Call ()
#26 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#27 0x000000000055c931 in _PyFunction_Vectorcall ()
#28 0x000000000051f186 in _PyObject_FastCallDictTstate ()
#29 0x0000000000565973 in ?? ()
#30 0x0000000000517e55 in _PyObject_MakeTpCall ()
#31 0x000000000052b940 in _PyEval_EvalFrameDefault ()
#32 0x000000000055c931 in _PyFunction_Vectorcall ()
#33 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#34 0x000000000055c931 in _PyFunction_Vectorcall ()
#35 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#36 0x000000000055c931 in _PyFunction_Vectorcall ()
#37 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#38 0x000000000055c931 in _PyFunction_Vectorcall ()
#39 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#40 0x000000000055c931 in _PyFunction_Vectorcall ()
#41 0x000000000052f8a2 in _PyEval_EvalFrameDefault ()
#42 0x000000000055c931 in _PyFunction_Vectorcall ()
#43 0x00007f6096f231a8 in __pyx_f_6llfuse_fuse_opendir (__pyx_v_req=0x7f600012e4f0, __pyx_v_ino=<optimized out>, __pyx_v_fi=0x7f602affcb90)
    at src/llfuse.c:22149
#44 0x00007f6097379266 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#45 0x00007f6097378093 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#46 0x00007f6096efd887 in __pyx_f_6llfuse_session_loop (__pyx_v_mem=0x7f608037eea0, __pyx_v_size=135168) at src/llfuse.c:44915
#47 0x00007f6096f01b01 in __pyx_f_6llfuse_worker_start (__pyx_v_data=0x7f6080001748) at src/llfuse.c:45067
#48 0x00007f6097f76044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#49 0x00007f6097ff661c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Actions #9

Updated by Peter Amstutz about 2 months ago

I don't know if this is a thing but I found this ticket that mentions the possibility of using binary wheels and multiple copies of libssl that step on each other...

https://github.com/psycopg/psycopg2/issues/543

I don't have any evidence that this is happening, though.

Actions #10

Updated by Peter Amstutz about 2 months ago

I'm strongly suspicious that this is some kind of SSL shared state bug in the Python binding maybe?

Actions #11

Updated by Peter Amstutz about 2 months ago

Possibly related, this also makes it seem like SSL could be stepping on itself

Mar 06 20:19:08 curiipeter arv-mount[271931]: 2024-03-06 20:19:08 arvados.arvados_fuse[271931] ERROR: Unhandled exception during FUSE operation
Mar 06 20:19:08 curiipeter arv-mount[271931]: Traceback (most recent call last):
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/usr/lib/python3.11/http/client.py", line 565, in _get_chunk_left
Mar 06 20:19:08 curiipeter arv-mount[271931]:     chunk_left = self._read_next_chunk_size()
Mar 06 20:19:08 curiipeter arv-mount[271931]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/usr/lib/python3.11/http/client.py", line 532, in _read_next_chunk_size
Mar 06 20:19:08 curiipeter arv-mount[271931]:     return int(line, 16)
Mar 06 20:19:08 curiipeter arv-mount[271931]:            ^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]: ValueError: invalid literal for int() with base 16: b'\x17s\x8e\x80\x14\xdc\xc5CY\xf0o\x86\x05\xae\xc3\xae\xbd                 \\"next_page_toke>
Mar 06 20:19:08 curiipeter arv-mount[271931]: During handling of the above exception, another exception occurred:
Mar 06 20:19:08 curiipeter arv-mount[271931]: Traceback (most recent call last):
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/usr/lib/python3.11/http/client.py", line 582, in _read_chunked
Mar 06 20:19:08 curiipeter arv-mount[271931]:     chunk_left = self._get_chunk_left()
Mar 06 20:19:08 curiipeter arv-mount[271931]:                  ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/usr/lib/python3.11/http/client.py", line 567, in _get_chunk_left
Mar 06 20:19:08 curiipeter arv-mount[271931]:     raise IncompleteRead(b'')
Mar 06 20:19:08 curiipeter arv-mount[271931]: http.client.IncompleteRead: IncompleteRead(0 bytes read)
Mar 06 20:19:08 curiipeter arv-mount[271931]: The above exception was the direct cause of the following exception:
Mar 06 20:19:08 curiipeter arv-mount[271931]: Traceback (most recent call last):
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/__init__.py", line 342, in catch_exceptions_wrapper
Mar 06 20:19:08 curiipeter arv-mount[271931]:     return orig_func(self, *args, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/__init__.py", line 587, in lookup
Mar 06 20:19:08 curiipeter arv-mount[271931]:     elif isinstance(p, Directory) and name in p:
Mar 06 20:19:08 curiipeter arv-mount[271931]:                                       ^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fusedir.py", line 1186, in __contains__
Mar 06 20:19:08 curiipeter arv-mount[271931]:     self[k]
Mar 06 20:19:08 curiipeter arv-mount[271931]:     ~~~~^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fresh.py", line 25, in use_counter_wrapper
Mar 06 20:19:08 curiipeter arv-mount[271931]:     return orig_func(self, *args, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fresh.py", line 33, in check_update_wrapper
Mar 06 20:19:08 curiipeter arv-mount[271931]:     self.checkupdate()
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fusedir.py", line 125, in checkupdate
Mar 06 20:19:08 curiipeter arv-mount[271931]:     self.update()
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fresh.py", line 25, in use_counter_wrapper
Mar 06 20:19:08 curiipeter arv-mount[271931]:     return orig_func(self, *args, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fusedir.py", line 1113, in update
Mar 06 20:19:08 curiipeter arv-mount[271931]:     contents.extend(obj for obj in arvados.util.keyset_list_all(
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/services/fuse/arvados_fuse/fusedir.py", line 1113, in <genexpr>
Mar 06 20:19:08 curiipeter arv-mount[271931]:     contents.extend(obj for obj in arvados.util.keyset_list_all(
Mar 06 20:19:08 curiipeter arv-mount[271931]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/sdk/python/arvados/util.py", line 224, in keyset_list_all
Mar 06 20:19:08 curiipeter arv-mount[271931]:     items = fn(**kwargs).execute(num_retries=num_retries)
Mar 06 20:19:08 curiipeter arv-mount[271931]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/scripts/venv3/lib/python3.11/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
Mar 06 20:19:08 curiipeter arv-mount[271931]:     return wrapped(*args, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:            ^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/scripts/venv3/lib/python3.11/site-packages/googleapiclient/http.py", line 923, in execute
Mar 06 20:19:08 curiipeter arv-mount[271931]:     resp, content = _retry_request(
Mar 06 20:19:08 curiipeter arv-mount[271931]:                     ^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/sdk/python/arvados/api.py", line 73, in _retry_request
Mar 06 20:19:08 curiipeter arv-mount[271931]:     response, body = _orig_retry_request(http, num_retries, *args, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/scripts/venv3/lib/python3.11/site-packages/googleapiclient/http.py", line 191, in _retry_request
Mar 06 20:19:08 curiipeter arv-mount[271931]:     resp, content = http.request(uri, method, *args, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/arvados/sdk/python/arvados/api.py", line 102, in _intercept_http_request
Mar 06 20:19:08 curiipeter arv-mount[271931]:     response, body = self.orig_http_request(uri, method, headers=headers, **kwargs)
Mar 06 20:19:08 curiipeter arv-mount[271931]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/scripts/venv3/lib/python3.11/site-packages/httplib2/__init__.py", line 1711, in request
Mar 06 20:19:08 curiipeter arv-mount[271931]:     (response, content) = self._request(
Mar 06 20:19:08 curiipeter arv-mount[271931]:                           ^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/scripts/venv3/lib/python3.11/site-packages/httplib2/__init__.py", line 1427, in _request
Mar 06 20:19:08 curiipeter arv-mount[271931]:     (response, content) = self._conn_request(conn, request_uri, method, body, headers)
Mar 06 20:19:08 curiipeter arv-mount[271931]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/home/peter/work/scripts/venv3/lib/python3.11/site-packages/httplib2/__init__.py", line 1408, in _conn_request
Mar 06 20:19:08 curiipeter arv-mount[271931]:     content = response.read()
Mar 06 20:19:08 curiipeter arv-mount[271931]:               ^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/usr/lib/python3.11/http/client.py", line 459, in read
Mar 06 20:19:08 curiipeter arv-mount[271931]:     return self._read_chunked(amt)
Mar 06 20:19:08 curiipeter arv-mount[271931]:            ^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 20:19:08 curiipeter arv-mount[271931]:   File "/usr/lib/python3.11/http/client.py", line 597, in _read_chunked
Mar 06 20:19:08 curiipeter arv-mount[271931]:     raise IncompleteRead(b''.join(value)) from exc
Mar 06 20:19:08 curiipeter arv-mount[271931]: http.client.IncompleteRead: IncompleteRead(3626 bytes read)
Actions #12

Updated by Brett Smith about 2 months ago

Peter Amstutz wrote in #note-9:

I don't know if this is a thing but I found this ticket that mentions the possibility of using binary wheels and multiple copies of libssl that step on each other...

https://github.com/psycopg/psycopg2/issues/543

I don't have any evidence that this is happening, though.

I'm not ruling anything out, but just adding: when we accidentally did this in #20846 (see the actual ticket description), the crash was immediate. As soon as you tried to connect to the database from the Pg gem, linked against a different version of OpenSSL than Ruby itself, it crashed. Presumably because the Pg client code was passing around structs with different sizes/shapes than the Ruby code.

I don't have a mental model for how a problem like this would not be a problem for serial access but only cause issues when you started doing concurrent access.

Actions #13

Updated by Brett Smith about 2 months ago

Spitballing: the whole reason we have ThreadSafeApiCache is because http.client is not threadsafe. Is it possible our FUSE code is doing something that could undermine that protection?

  • Is it sharing a client object across a non-thread boundary like a fork?
  • Is there code that passes around the underlying HTTP client object directly? That would bypass the coordination of ThreadSafeApiCache and potentially let multiple threads access it.
    • Note that the problem doesn't seem to be "API client and Keep client share an HTTP client" because we can trigger the crash without ever fetching data from Keep.
  • We could start looking for bugs in ThreadSafeApiCache to see if we can orchestrate situations where it doesn't work as intended. We could maybe write a stress test for it directly.

I'm suggesting all this without looking at the FUSE code, kind of on purpose. Like, off the top of my head, I'm not aware that FUSE does any kind of multiprocessing besides threading, so I don't think #1 is an option. But I would rather make a suggestion and then rule it out rather than make a bad assumption and leave a stone unturned.

Actions #14

Updated by Peter Amstutz about 2 months ago

  • Status changed from New to In Progress
Actions #15

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Actions #16

Updated by Peter Amstutz about 2 months ago

  • Release set to 69
Actions #17

Updated by Peter Amstutz about 1 month ago

This will be fixed as part of a bigger set of fixes on #21541

For posterity, this ended up being 100% our code mistakenly sharing a http2 object, by binding the api.config function early so all the other threads ended up using the same object, instead of going through ThreadsafeApiCache and getting private http2 objects as intended.

The solution was to not do that, and instead pull out the one single config item we actually care about, instead of passing around a whole callable object with all of the unbounded behavior that entails.

Actions #18

Updated by Peter Amstutz about 1 month ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Actions #19

Updated by Peter Amstutz 25 days ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF