Bug #9732

[FUSE] performance issues stat() in python

Added by Nico C├ęsar over 4 years ago. Updated almost 4 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


Hi Curoverse,

we are seeing some performance issues accessing arv fuser mounts with python

from one of our users:

I noticed that - when mounting arvados keep - running os.stat is very slow.
I am running a workflow that that has 18 BAM files as input, each about 60Gb,
and a reference sequence of 3Gb.

The time it takes to build a workflow is 6 mins, almost all of this time is spent in calls to os.stat. To give an example:

  1. 2016-08-02 13:46:18,118 DEBUG DB: collected 0.368199110031 /local/scratch/benchmark/ctmpBa9g4h/shared/Open_Access/xxxxxxxxxxx.fa
  2. 2016-08-02 13:46:24,087 DEBUG DB: collected 5.96899294853 /local/scratch/benchmark/ctmpBa9g4h/shared/Open_Access/xxxxxxxxxxxxxxxxxxx.bam
  3. 2016-08-02 13:46:25,408 DEBUG DB: collected 1.32038497925 /local/scratch/benchmark/ctmpBa9g4h/shared/Open_Access/xxxxxxxxxxxxxxxxxxxx.fa
  4. 2016-08-02 13:46:37,359 DEBUG DB: collected 11.9504730701 /local/scratch/benchmark/ctmpBa9g4h/shared/Open_Access/xxxxxxxxxxxxxxxxxxxx.bam
  5. 2016-08-02 13:46:38,718 DEBUG DB: collected 1.35802483559 /local/scratch/benchmark/ctmpBa9g4h/shared/Open_Access/xxxxxxxxxxxxxxxxxxxx.fa
  6. 2016-08-02 13:46:50,784 DEBUG DB: collected 12.065251112 /local/scratch/benchmark/ctmpBa9g4h/shared/Open_Access/xxxxxxxxxxxxxxxxx.bam

Each log message contains a number and a filename. The number is the
time in seconds of the os.stat call to the filename. From these numbers I deduce:

1. The time for os.stat is proportional to the file size (compare bam versus fasta file)
2. The time for os.stat does not reduce (see repeated calls to fasta file)

This is on london. I am already caching within python all the os.stat calls.
It seems to me that stat() from the command line is not similarly affected.
The issue might very well be in the python posix module.

Do you - or the people at arvados - have any advice? I expect this could be a generic issue for anyone trying to build workflows from data in arvados keep and using file stamps.

As the user pointed out it could very well be an issue with python, though I thought it was worth checking with you guys,

kind regards,


Related issues

Related to Arvados - Bug #10629: [FUSE] slow enumerating files by collection uuidResolved11/24/2016


#1 Updated by Tom Morris almost 4 years ago

This seems like it could be related to #10629

#2 Updated by Tom Morris almost 4 years ago

  • Target version set to Arvados Future Sprints

Also available in: Atom PDF