Bug #10629

[FUSE] slow enumerating files by collection uuid

Added by Tom Morris 11 months ago. Updated 8 months ago.

Status:ResolvedStart date:11/24/2016
Priority:NormalDue date:
Assignee:Peter Amstutz% Done:

100%

Category:Performance
Target version:2017-03-01 sprint
Story points1.0Remaining (hours)0.00 hour
Velocity based estimate0 days

Description

It takes 38 minutes (!!) to enumerate 242K files. During this time the arv-mount process is pegged at 100% CPU and no network traffic is being done. The manifest text totals just 9MB (9243914 char), so it's taking over 4 min / MB to parse this.

$ time find keep/by_id/e51c5-4zz18-l3dq8bw20uwz0qd -print | wc -l
241751
real 38m0.969s
user 0m0.224s
sys 0m0.300s

$ wc *.manifest
2497 252893 9243914 e51c5-4zz18-l3dq8bw20uwz0qd.manifest


Subtasks

Task #11114: Review 10629-fuse-listing-perfResolvedPeter Amstutz


Related issues

Related to Arvados - Bug #6019: [FUSE] `ls` in a large home directory (4400+ items) takes... Closed 05/13/2015
Related to Arvados - Bug #9732: [FUSE] performance issues stat() in python New 08/04/2016

Associated revisions

Revision 9b2aa422
Added by Peter Amstutz 8 months ago

Merge branch '10629-fuse-listing-perf' closes #10629

History

#1 Updated by Tom Clegg 11 months ago

Suggested places to start
  • debug trace - which / how many fuse operations per file
  • make a test manifest big enough to exhibit slowness (10K files?), and try
    • squishing dir hierarchy (is slowness related to dir depth?)
    • all files in one dir (is slowness related to # files per dir?)
  • double the # files and see how that affects timing (is it O(N) or O(N^2)?)

#2 Updated by Tom Clegg 11 months ago

  • Story points set to 1.0

#3 Updated by Tom Morris 8 months ago

  • Assignee set to Peter Amstutz
  • Target version changed from Arvados Future Sprints to 2017-03-01 sprint

#4 Updated by Peter Amstutz 8 months ago

peteramstutz@shell:~$ time find keep/by_id/83325435ac6cf1a851f4e1aadf4df0e3+8675570 -print | wc -l
241751

real    0m3.723s
user    0m0.124s
sys    0m0.180s

So the problem is that there is different behavior for collections accessed by UUID vs. by PDH. It seems to be doing some expensive synchronization operation which is elided for PDH (which is immutable).

#5 Updated by Peter Amstutz 8 months ago

  • Subject changed from arv-mount pathologically slow enumerating files to [FUSE] slow enumerating files by collection uuid

#6 Updated by Peter Amstutz 8 months ago

  • Status changed from New to In Progress
class Handle(object):
    """Connects a numeric file handle to a File or Directory object that has
    been opened by the client.""" 

    def flush(self):
        if self.obj.writable():
            return self.obj.flush()

Several problems here.

  1. Opendir and releasedir are only ever used to get the directory listing via readdir(). Because a directory handle isn't used to modify the directory, calling flush() is spurious.
  2. If the Operations() object was created with enable_write=False, calling flush() is spurious.
  3. The CollectionDirectory object is considered "writable" despite enable_write=False
  4. Finally, computing committed() (to decide whether to actually send and updated manifest to the server) checks the _committed flag on every object. When there are 240000 files, that is expensive (especially because it makes a function call and increments/decrements a recursive mutex at each node.)

Todo:

  1. Fix set_committed() to accept True or False and propagate the flag up or down accordingly. Change committed() to only test the local flag.
  2. Don't flush directory handles at all.

#7 Updated by Peter Amstutz 8 months ago

10629-fuse-listing-perf ready for review.

#8 Updated by Lucas Di Pentima 8 months ago

LGTM.
Local sdk/python and services/fuse tests ran without issues.
Tried to do some benchmarking using arvbox but wasn't able to start it, I don't want to stall this review any longer, if you have timings after the fix it would be nice to have the comparison here.

#9 Updated by Peter Amstutz 8 months ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:9b2aa42213a7d333bbe93e040c2d152a70e9b5af.

Also available in: Atom PDF