Bug #19872
closedToo many open files error
Description
Running a test case with running "zcat" on a bunch of gzipped fastq files, it is getting the following error after a while:
2022-12-12 17:12:52 arvados.api[8871] DEBUG: [req-49kou2g0vgw59vt7dbt1] Retrying API request in 4 s after socket error Traceback (most recent call last): File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/arvados_python_client-2.5.0.dev20221202182828-py3.8.egg/arvados/api.py", line 88, in _intercept_http_request File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/httplib2-0.20.1-py3.8.egg/httplib2/__init__.py", line 1711, in request File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/httplib2-0.20.1-py3.8.egg/httplib2/__init__.py", line 1427, in _request File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/httplib2-0.20.1-py3.8.egg/httplib2/__init__.py", line 1349, in _conn_request File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/httplib2-0.20.1-py3.8.egg/httplib2/__init__.py", line 1125, in connect File "/opt/rh/rh-python38/root/usr/lib64/python3.8/socket.py", line 918, in getaddrinfo OSError: [Errno 24] Too many open files
The most likely explanation is that Python isn't garbage collecting the mmap'd keep cache blocks as expected -- need to investigate.
Files
Updated by Peter Amstutz about 2 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 2 years ago
19872-mnt-cache-limits @ 9f7c39451c16003c6c6e0fb8de5a990781cb300f
- Reduce max slots to 3/8 max fds instead of 1/2 because mmap() uses a
second file descriptor, and we keep the original file descriptor open
for flock()
- Rework how cache slots are allocated to try evicting things before
allocating a new cache slot, so the cache should be somewhat better
behaved about staying within its configured limits.
Updated by Peter Amstutz about 2 years ago
- File arvados-python-client-2.5.0.dev20221213224332.tar.gz arvados-python-client-2.5.0.dev20221213224332.tar.gz added
- File arvados_fuse-2.5.0.dev20221213224332.tar.gz arvados_fuse-2.5.0.dev20221213224332.tar.gz added
Test packages attached
Updated by Peter Amstutz about 2 years ago
As a side effect, I think I confirmed that Python is garbage collecting things as expected, so the only real problem was that I was not aware that every mmap() allocates an additional file descriptor. As a side effect, the upper limit on the cache is now effectively now 24 GiB instead of 32 GiB.
This can potentially be increased by increasing RLIMIT_NOFILE. Maybe we want to call setrlimit and adjust that up to 2048 or 4096?
Updated by Peter Amstutz about 2 years ago
Oh yea, the user that reported this issue tested it again with the packages above and reported it was fixed for them.
Updated by Tom Clegg about 2 years ago
Everyone seems to agree the default/typical NOFILE limit of 1024 is too low. Consuming 3/4 of them seems like a bit much. Having a client library adjust NOFILE seems a little bit weird but at least arv-mount
could raise NOFILE limit to 10240 if it's lower than that (and log a warning if it can't be raised?), and sdk/python could limit _max_slots
to NOFILE/8. That would leave us with max 128-block / 8 GiB cache for most callers that don't adjust their NOFILE==1024, which doesn't seem so bad, and probably max 80 GiB for arv-mount, which seems like plenty.
Ideally we would be able to control fd usage by closing the files without deleting them -- especially in the case where any process with NOFILE=1024 immediately deletes lot of cache blocks that other processes with higher limits could still be using -- but it looks like that would involve more refactoring than it's worth at this point.
So, still room to improve, but even the current version is worth merging.
Updated by Peter Amstutz about 2 years ago
19872-mnt-cache-limits @ 4a832a93cd0baf253575936a79f83bcc4f666a82
module default is NOFILE/8 (so it consumes up to 1/4 of available file descriptors)
arv-mount adjust rlimit to 10240
Updated by Peter Amstutz about 2 years ago
- Status changed from In Progress to Resolved