Bug #6219
closed[FUSE] [Performance] Add performance tests for FUSE
Description
Short version: implement a separate test suite that reports timing profiles. Eventually we'll have comprehensive performance tests, but for this ticket, focus tests on operations that parse and manipulate collection objects and manifests. In particular, try to avoid talking to Keep:
- List the contents of a project that contains many collections
- List the contents of a large collection
- Update one collection by copying files from another
- (Assuming FUSE supports it) Create a collection from scratch by making a directory for it and copying files to it from an existing Collection, and saving
- (Others? But again, the focus is on collections performance.)
When in doubt, for implementation guidance, follow the pattern for the API server and Workbench established in #6087, unless that's grossly un-Pythonic. Use the large collection test fixtures created during that development.
Updated by Brett Smith over 9 years ago
- Description updated (diff)
- Category set to FUSE
Updated by Radhika Chippada over 9 years ago
- Subject changed from [FUSE] Add performance tests for FUSE to [FUSE] [Performance] Add performance tests for FUSE
Updated by Peter Amstutz over 9 years ago
See https://arvados.org/issues/3198#note-39 and https://arvados.org/issues/3198#note-40 for some benchmarking notes around FUSE and Collections.
Updated by Brett Smith over 9 years ago
- Assigned To set to Radhika Chippada
Follow the discussion starting at https://arvados.org/issues/6218#note-5 for implementation ideas.
Updated by Brett Smith over 9 years ago
It may go without saying, but writing tests for write operations is blocked by #3198. However, it should be possible to start the performance test framework with tests for read operations, in a way that's extensible to write operations as well.
Updated by Radhika Chippada over 9 years ago
- Status changed from New to In Progress
Updated by Radhika Chippada over 9 years ago
Regarding branch 6219-fuse-performance-testing:
- This branch is from 3198-writable-fuse branch
- Includes some tests for testing collection operations
- Create collection with files with multiple blocks; and move and remove a file. The test takes forever if I use too many files or blocks. So, used 6 files with 2 blocks each
- Create collection and add 2 streams with 200 files each to it; move and remove one file. Similar to the above test, it takes too long if I push the numbers much higher than this.
- Create collection and add files using magic dir; move all files into another collection
- Create collection and add files using magic dir; move one file at a time into another collection
- I could not figure out how to add multiple streams to a collection when testing with magic dir. During review, please provide a hint as to how to do this. Thanks.
Updated by Brett Smith over 9 years ago
Radhika Chippada wrote:
I did not add test to retrieve project contents at this time.
Was there some obstacle that made this impossible or impractical at this time? The performance of this specific operation is one that we've received bug reports about (e.g., #6019), so it's one that we know for sure users have interest in. The same goes for listing files in a large collection (#5662)—I'm not sure if that's implicitly covered in one of your existing tests, but it would be good to have profiling results for that in isolation.
Updated by Radhika Chippada over 9 years ago
The branch now contains test that lists a project's contents as well.
Updated by Brett Smith over 9 years ago
Radhika Chippada wrote:
The branch now contains test that lists a project's contents as well.
Thanks. And listing the files in a collection with many files?
Updated by Peter Amstutz over 9 years ago
In magicDirTestMoveFiles_oneEachIntoAnother
The @profiled
annotation is around a loop which calls "pool.apply". I don't know how much overhead the Python multiprocessing adds compared to the function being called, so it would be better to have the outer loop and @profiled
annotation to be part of magicDirTest_MoveFileFromCollection
instead.
Updated by Radhika Chippada over 9 years ago
Thanks Peter for the comments. I updated the branch to profile smaller code fragments. Please take another look.
Updated by Brett Smith over 9 years ago
- Target version changed from 2015-07-08 sprint to 2015-07-22 sprint
Updated by Radhika Chippada over 9 years ago
- Story points changed from 2.0 to 0.5
Updated by Peter Amstutz over 9 years ago
I'm not entirely sure this answers the performance questions about FUSE that I would want to ask, but this branch has been outstanding for awhile so let's go ahead and merge.
Updated by Brett Smith over 9 years ago
Peter Amstutz wrote:
I'm not entirely sure this answers the performance questions about FUSE that I would want to ask, but this branch has been outstanding for awhile so let's go ahead and merge.
One of our criteria for branch reviews is, "Does the branch do what the story specifies?" So, does this branch provide profile data for the use cases listed in the description? If so, then we're golden. If not, I'd like to know more about what the discrepancies are and why they exist.
Updated by Radhika Chippada over 9 years ago
- Using FUSE for a collection with streams: 2, files_per_stream: 200, bytes_per_file: 1
create collection | list files | move one file | remove one file |
45.577s | 0.995s | 0.132s | 0.111s |
- Using FUSE for a collection with streams: 2, files_per_stream: 3, blocks_per_file: 2 bytes_per_block: 2**26
create collection | list files | move one file | remove one file |
46.351s | 54.671s | 0.092s | 0.100s |
- Using Magi Dir for a collection with streams: 2, files_per_stream: 200, bytes_per_file: 1
create collection | list files | move one file | remove one file | move all files |
0.58s | 0.604s | 0.210s | 0.103s | 34.935s |
Updated by Radhika Chippada over 9 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:39c75ea686e2326508fd8e3d0be31cdde7906597.