Project

General

Profile

Actions

Bug #8189

closed

[FUSE] Listing a project directory is slow when there are many subprojects

Added by Jiayong Li over 8 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
FUSE
Target version:
Story points:
-
Release relationship:
Auto

Description

I created a 2947 sub-projects under the project '1000_genome_exome_raw_reads' with uuid su92l-j7d0g-3c6tenm6q4xn7qm on su92l. As a result, directory operations under that project are slow. For example, 'ls' takes nearly two minutes.


Subtasks 1 (0 open1 closed)

Task #8210: review 8189-handle-large-collections-betterResolvedTom Clegg01/11/2016Actions

Related issues

Related to Arvados - Bug #8183: [Workbench] should not look up every group/project a user has access to on every page loadResolvedRadhika Chippada02/12/2016Actions
Actions #1

Updated by Brett Smith over 8 years ago

  • Subject changed from [Keep] Directory operations are slow after the creation of a large number of projects. to [FUSE] Listing a project directory is slow when there are many subprojects
Actions #2

Updated by Tom Clegg over 8 years ago

arvados_fuse's ProjectDirectory class uses arvados.util.list_all:

                contents = arvados.util.list_all(self.api.groups().contents,
                                                 self.num_retries, uuid=self.project_uuid)

arvados.util.list_all doesn't set a limit either, so we get the API's default limit of 100 items per page.

Suggest modifying arvados.util.list_all (in source:sdk/python/arvados/util.py#L365) to do something like

kwargs.setdefault('limit', sys.maxint)

That way, the API server's MAX_LIMIT (currently 1000) will determine the page size.

The rationale is that, once the client is in an API request loop that it won't exit until it gets all of the items, it's never a good idea for it to get fewer items per API request. Getting fewer items per page only makes sense if the client has some chance of doing something else (exiting the loop or processing a subset of results) before receiving MAX_LIMIT results.

(ArvadosResourceList#each_page in source:apps/workbench/app/models/arvados_resource_list.rb#177 needs this fix, too.)

Actions #3

Updated by Ward Vandewege over 8 years ago

  • Assigned To set to Ward Vandewege
  • Target version set to 2016-01-20 Sprint
Actions #4

Updated by Ward Vandewege over 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:37a1505b607bbf533512f48b47f208c5cde4c435.

Actions #5

Updated by Tom Clegg over 8 years ago

  • Status changed from Resolved to In Progress
Actions #6

Updated by Jiayong Li over 8 years ago

Right now su92l workbench is still considerably slower than qr1hi workbench.

On su92l, I also noticed huge performance difference between read-only mount and writable mount (both freshly mounted to reflect recent changes).

read-only mount:
$ time ls keep/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
real 1m0.422s
user 0m0.020s
sys 0m0.060s

writable mount:
$ time ls mnt/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
real 94m40.150s
user 0m0.028s
sys 0m0.080s

Actions #7

Updated by Brett Smith over 8 years ago

  • Target version deleted (2016-01-20 Sprint)
Actions #8

Updated by Brett Smith over 8 years ago

  • Target version set to Arvados Future Sprints
Actions #9

Updated by Jiayong Li about 8 years ago

I tried running a pipeline on su92l, but the "Run a pipeline" button on the workbench homepage is not clickable now.

Actions #10

Updated by Ward Vandewege almost 5 years ago

  • Status changed from In Progress to Resolved
  • Target version changed from Arvados Future Sprints to 2019-05-22 Sprint

This was resolved long ago, here's the performance today:


wardv@shell:~$ time ls keep/by_id/su92l-j7d0g-3c6tenm6q4xn7qm

...

real    0m8.187s
user    0m0.021s
sys    0m0.086s
Actions #11

Updated by Tom Morris almost 5 years ago

  • Release set to 15
Actions

Also available in: Atom PDF