Project

General

Profile

Actions

Bug #10584

closed

[FUSE] high memory consumption (possible leak) in long-running arv-mount

Added by Joshua Randall almost 8 years ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

We have a (little used) arv-mount that has been running since 6th September.

It was started with the command line:
`/usr/bin/python2.7 /usr/bin/arv-mount /tmp/keep_jr17`

Since no `--file-cache` or `--directory-cache` options were given, those should have been the defaults of 256MiB and 128MiB. If I start a new arv-cache also with defaults and then read some large data through it and exercise some large directories (such as doing a find in `by_tag`), I am able to get memory usage up to 514MB, which seems reasonable.

However, the arv-mount that has been running for the past 77 days is now taking up 15GB of RAM!

I suspect this issue might be related to the increasing memory usage I observed and reported in #10535 when the python SDK test suite got stuck in a tight PollClient loop forever (where "forever" is until it ran the system out of memory).


Subtasks 2 (0 open2 closed)

Task #11890: Review 10584-fuse-stop-threadsResolvedPeter Amstutz11/22/2016Actions
Task #12067: InstrumentationClosedPeter Amstutz04/03/2024Actions
Actions #1

Updated by Tom Morris over 7 years ago

  • Subject changed from high memory consumption (possible leak) in long-running arv-mount to [FUSE] high memory consumption (possible leak) in long-running arv-mount
  • Target version set to Arvados Future Sprints
Actions #2

Updated by Tom Morris over 7 years ago

  • Target version changed from Arvados Future Sprints to 2017-07-05 sprint
Actions #3

Updated by Peter Amstutz over 7 years ago

  • Assigned To set to Peter Amstutz
Actions #4

Updated by Peter Amstutz over 7 years ago

Some theories:

  • This might be related/due to https://dev.arvados.org/issues/11158 → it is that it is trying to enumerate the entire home directory and it uses up all memory trying to store the full contents.
  • Cache management clears releases unused Collection objects. However, those Collection objects may have prefetch threads. If they don't get stopped, they will leak. *
Actions #5

Updated by Peter Amstutz over 7 years ago

  • Target version changed from 2017-07-05 sprint to 2017-07-19 sprint
Actions #6

Updated by Peter Amstutz over 7 years ago

10584-fuse-stop-threads

Ensure get/put threads are stopped before releasing reference to Collection object. Unclear if this is the source of the problem, but seems like a good idea regardless.

Actions #7

Updated by Lucas Di Pentima over 7 years ago

The thread stopping code was added on a CollectionDirectoryBase subclass, is it possible for this problem to happen with TmpCollectionDirectory objects too? Maybe it’s better to do the thread stopping on CollectionDirectoryBase?

Actions #8

Updated by Peter Amstutz about 7 years ago

Lucas Di Pentima wrote:

The thread stopping code was added on a CollectionDirectoryBase subclass, is it possible for this problem to happen with TmpCollectionDirectory objects too? Maybe it’s better to do the thread stopping on CollectionDirectoryBase?

CollectionDirectoryBase objects are used to hold Subcollection objects, which don't have a stop_threads() method.

TmpCollectionDirectory are not candidates for cache eviction (persisted() is False). The finalize() method already calls stop_threads().

The difference between clear() and finalize() is that clear() is called when we want to evict an inode's cached contents, whereas finalize() is called when the inode will be deleted entirely.

Actions #9

Updated by Lucas Di Pentima about 7 years ago

Ok, so this looks good to me. Thanks!

Actions #10

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Actions #12

Updated by Peter Amstutz about 7 years ago

  • Look at user interaction history with keep
  • Track metrics
  • Instrumentation to report memory usage / ownership
Actions #13

Updated by Peter Amstutz about 7 years ago

  • Target version changed from 2017-08-02 sprint to 2017-08-16 sprint
Actions #14

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-08-16 sprint to 2017-08-30 Sprint
Actions #15

Updated by Peter Amstutz about 7 years ago

  • Assigned To deleted (Peter Amstutz)
  • Target version changed from 2017-08-30 Sprint to 2017-09-13 Sprint
Actions #16

Updated by Tom Morris about 7 years ago

  • Target version changed from 2017-09-13 Sprint to Arvados Future Sprints
Actions #17

Updated by Tom Clegg about 7 years ago

Might be worth running the "retry PUT" test many times in a row. At least once I've seen the test suite get stuck there using lots of memory.

Actions #18

Updated by Peter Amstutz about 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #19

Updated by Peter Amstutz over 1 year ago

  • Release set to 60
Actions #20

Updated by Peter Amstutz 7 months ago

  • Target version set to Future
Actions #21

Updated by Peter Amstutz 6 months ago

  • Target version deleted (Future)
  • Status changed from New to Closed
Actions #22

Updated by Peter Amstutz 6 months ago

  • Release deleted (60)

Closed as out of date, but recent improvements to arv-mount have improved the memory usage over time.

Actions

Also available in: Atom PDF