Bug #10584
closed[FUSE] high memory consumption (possible leak) in long-running arv-mount
Description
We have a (little used) arv-mount that has been running since 6th September.
It was started with the command line:
`/usr/bin/python2.7 /usr/bin/arv-mount /tmp/keep_jr17`
Since no `--file-cache` or `--directory-cache` options were given, those should have been the defaults of 256MiB and 128MiB. If I start a new arv-cache also with defaults and then read some large data through it and exercise some large directories (such as doing a find in `by_tag`), I am able to get memory usage up to 514MB, which seems reasonable.
However, the arv-mount that has been running for the past 77 days is now taking up 15GB of RAM!
I suspect this issue might be related to the increasing memory usage I observed and reported in #10535 when the python SDK test suite got stuck in a tight PollClient loop forever (where "forever" is until it ran the system out of memory).
Updated by Tom Morris over 7 years ago
- Subject changed from high memory consumption (possible leak) in long-running arv-mount to [FUSE] high memory consumption (possible leak) in long-running arv-mount
- Target version set to Arvados Future Sprints
Updated by Tom Morris over 7 years ago
- Target version changed from Arvados Future Sprints to 2017-07-05 sprint
Updated by Peter Amstutz over 7 years ago
Some theories:
- This might be related/due to https://dev.arvados.org/issues/11158 → it is that it is trying to enumerate the entire home directory and it uses up all memory trying to store the full contents.
- Cache management clears releases unused Collection objects. However, those Collection objects may have prefetch threads. If they don't get stopped, they will leak. *
Updated by Peter Amstutz over 7 years ago
- Target version changed from 2017-07-05 sprint to 2017-07-19 sprint
Updated by Peter Amstutz over 7 years ago
10584-fuse-stop-threads
Ensure get/put threads are stopped before releasing reference to Collection object. Unclear if this is the source of the problem, but seems like a good idea regardless.
Updated by Lucas Di Pentima over 7 years ago
The thread stopping code was added on a CollectionDirectoryBase
subclass, is it possible for this problem to happen with TmpCollectionDirectory
objects too? Maybe it’s better to do the thread stopping on CollectionDirectoryBase
?
Updated by Peter Amstutz about 7 years ago
Lucas Di Pentima wrote:
The thread stopping code was added on a
CollectionDirectoryBase
subclass, is it possible for this problem to happen withTmpCollectionDirectory
objects too? Maybe it’s better to do the thread stopping onCollectionDirectoryBase
?
CollectionDirectoryBase
objects are used to hold Subcollection
objects, which don't have a stop_threads()
method.
TmpCollectionDirectory are not candidates for cache eviction (persisted() is False). The finalize()
method already calls stop_threads()
.
The difference between clear()
and finalize()
is that clear()
is called when we want to evict an inode's cached contents, whereas finalize()
is called when the inode will be deleted entirely.
Updated by Lucas Di Pentima about 7 years ago
Ok, so this looks good to me. Thanks!
Updated by Tom Morris about 7 years ago
- Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Updated by Peter Amstutz about 7 years ago
- Look at user interaction history with keep
- Track metrics
- Instrumentation to report memory usage / ownership
Updated by Peter Amstutz about 7 years ago
- Target version changed from 2017-08-02 sprint to 2017-08-16 sprint
Updated by Tom Morris about 7 years ago
- Target version changed from 2017-08-16 sprint to 2017-08-30 Sprint
Updated by Peter Amstutz about 7 years ago
- Assigned To deleted (
Peter Amstutz) - Target version changed from 2017-08-30 Sprint to 2017-09-13 Sprint
Updated by Tom Morris about 7 years ago
- Target version changed from 2017-09-13 Sprint to Arvados Future Sprints
Updated by Tom Clegg about 7 years ago
Might be worth running the "retry PUT" test many times in a row. At least once I've seen the test suite get stuck there using lots of memory.
Updated by Peter Amstutz about 3 years ago
- Target version deleted (
Arvados Future Sprints)
Updated by Peter Amstutz 6 months ago
- Target version deleted (
Future) - Status changed from New to Closed
Updated by Peter Amstutz 6 months ago
- Release deleted (
60)
Closed as out of date, but recent improvements to arv-mount have improved the memory usage over time.