Bug #8189: [FUSE] Listing a project directory is slow when there are many subprojects - Arvados

Actions

Copy link

Bug #8189

closed

[FUSE] Listing a project directory is slow when there are many subprojects

Added by Jiayong Li about 9 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Ward Vandewege

Category:

FUSE

Target version:

2019-05-22 Sprint

Story points:

Release:

Arvados v1.4 - Q1/Q2 2019

Release relationship:

Auto

Description

I created a 2947 sub-projects under the project '1000_genome_exome_raw_reads' with uuid su92l-j7d0g-3c6tenm6q4xn7qm on su92l. As a result, directory operations under that project are slow. For example, 'ls' takes nearly two minutes.

Subtasks 1 (0 open — 1 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Brett Smith about 9 years ago

Subject changed from [Keep] Directory operations are slow after the creation of a large number of projects. to [FUSE] Listing a project directory is slow when there are many subprojects

Actions

Copy link

Updated by Tom Clegg about 9 years ago

arvados_fuse's ProjectDirectory class uses arvados.util.list_all:

                contents = arvados.util.list_all(self.api.groups().contents,
                                                 self.num_retries, uuid=self.project_uuid)

arvados.util.list_all doesn't set a limit either, so we get the API's default limit of 100 items per page.

Suggest modifying arvados.util.list_all (in source:sdk/python/arvados/util.py#L365) to do something like

kwargs.setdefault('limit', sys.maxint)

That way, the API server's MAX_LIMIT (currently 1000) will determine the page size.

The rationale is that, once the client is in an API request loop that it won't exit until it gets all of the items, it's never a good idea for it to get fewer items per API request. Getting fewer items per page only makes sense if the client has some chance of doing something else (exiting the loop or processing a subset of results) before receiving MAX_LIMIT results.

(ArvadosResourceList#each_page in source:apps/workbench/app/models/arvados_resource_list.rb#177 needs this fix, too.)

Actions

Copy link

Updated by Ward Vandewege about 9 years ago

Assigned To set to Ward Vandewege
Target version set to 2016-01-20 Sprint

Actions

Copy link

Updated by Ward Vandewege about 9 years ago

Status changed from New to Resolved
% Done changed from 0 to 100

Applied in changeset arvados|commit:37a1505b607bbf533512f48b47f208c5cde4c435.

Actions

Copy link

Updated by Tom Clegg about 9 years ago

Status changed from Resolved to In Progress

Actions

Copy link

Updated by Jiayong Li about 9 years ago

Right now su92l workbench is still considerably slower than qr1hi workbench.

On su92l, I also noticed huge performance difference between read-only mount and writable mount (both freshly mounted to reflect recent changes).

read-only mount:
$ time ls keep/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
real 1m0.422s
user 0m0.020s
sys 0m0.060s

writable mount:
$ time ls mnt/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
real 94m40.150s
user 0m0.028s
sys 0m0.080s

Actions

Copy link

Updated by Brett Smith about 9 years ago

Target version deleted (~~2016-01-20 Sprint~~)

Actions

Copy link

Updated by Brett Smith about 9 years ago

Target version set to Arvados Future Sprints

Actions

Copy link

Updated by Jiayong Li about 9 years ago

I tried running a pipeline on su92l, but the "Run a pipeline" button on the workbench homepage is not clickable now.

Actions

Copy link

#10

Updated by Ward Vandewege almost 6 years ago

Status changed from In Progress to Resolved
Target version changed from Arvados Future Sprints to 2019-05-22 Sprint

This was resolved long ago, here's the performance today:


wardv@shell:~$ time ls keep/by_id/su92l-j7d0g-3c6tenm6q4xn7qm

...

real    0m8.187s
user    0m0.021s
sys    0m0.086s

Actions

Copy link

#11

Updated by Tom Morris almost 6 years ago

Release set to 15

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Bug #8189

[FUSE] Listing a project directory is slow when there are many subprojects

Updated by Brett Smith about 9 years ago

Updated by Tom Clegg about 9 years ago

Updated by Ward Vandewege about 9 years ago

Updated by Ward Vandewege about 9 years ago

Updated by Tom Clegg about 9 years ago

Updated by Jiayong Li about 9 years ago

Updated by Brett Smith about 9 years ago

Updated by Brett Smith about 9 years ago

Updated by Jiayong Li about 9 years ago

Updated by Ward Vandewege almost 6 years ago

Updated by Tom Morris almost 6 years ago