[FUSE] Listing a project directory is slow when there are many subprojects
I created a 2947 sub-projects under the project '1000_genome_exome_raw_reads' with uuid su92l-j7d0g-3c6tenm6q4xn7qm on su92l. As a result, directory operations under that project are slow. For example, 'ls' takes nearly two minutes.
Make the Python SDK and workbench effectively default to the API
server's MAX_LIMIT when requesting a list of objects, in those cases
where no explicit limit is set in the client code.
#2 Updated by Tom Clegg about 6 years ago
arvados_fuse's ProjectDirectory class uses arvados.util.list_all:
contents = arvados.util.list_all(self.api.groups().contents, self.num_retries, uuid=self.project_uuid)
arvados.util.list_all doesn't set a limit either, so we get the API's default limit of 100 items per page.
Suggest modifying arvados.util.list_all (in source:sdk/python/arvados/util.py#L365) to do something like
That way, the API server's MAX_LIMIT (currently 1000) will determine the page size.
The rationale is that, once the client is in an API request loop that it won't exit until it gets all of the items, it's never a good idea for it to get fewer items per API request. Getting fewer items per page only makes sense if the client has some chance of doing something else (exiting the loop or processing a subset of results) before receiving MAX_LIMIT results.
(ArvadosResourceList#each_page in source:apps/workbench/app/models/arvados_resource_list.rb#177 needs this fix, too.)
#6 Updated by Jiayong Li about 6 years ago
Right now su92l workbench is still considerably slower than qr1hi workbench.
On su92l, I also noticed huge performance difference between read-only mount and writable mount (both freshly mounted to reflect recent changes).
$ time ls keep/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
$ time ls mnt/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
#10 Updated by Ward Vandewege over 2 years ago
- Status changed from In Progress to Resolved
- Target version changed from Arvados Future Sprints to 2019-05-22 Sprint
This was resolved long ago, here's the performance today:
wardv@shell:~$ time ls keep/by_id/su92l-j7d0g-3c6tenm6q4xn7qm ... real 0m8.187s user 0m0.021s sys 0m0.086s