Bug #6219
closed
[FUSE] [Performance] Add performance tests for FUSE
Added by Brett Smith over 9 years ago.
Updated over 9 years ago.
Assigned To:
Radhika Chippada
Description
Short version: implement a separate test suite that reports timing profiles. Eventually we'll have comprehensive performance tests, but for this ticket, focus tests on operations that parse and manipulate collection objects and manifests. In particular, try to avoid talking to Keep:
- List the contents of a project that contains many collections
- List the contents of a large collection
- Update one collection by copying files from another
- (Assuming FUSE supports it) Create a collection from scratch by making a directory for it and copying files to it from an existing Collection, and saving
- (Others? But again, the focus is on collections performance.)
When in doubt, for implementation guidance, follow the pattern for the API server and Workbench established in #6087, unless that's grossly un-Pythonic. Use the large collection test fixtures created during that development.
- Description updated (diff)
- Category set to FUSE
- Subject changed from [FUSE] Add performance tests for FUSE to [FUSE] [Performance] Add performance tests for FUSE
- Assigned To set to Radhika Chippada
It may go without saying, but writing tests for write operations is blocked by #3198. However, it should be possible to start the performance test framework with tests for read operations, in a way that's extensible to write operations as well.
- Status changed from New to In Progress
Regarding branch 6219-fuse-performance-testing:
- This branch is from 3198-writable-fuse branch
- Includes some tests for testing collection operations
- Create collection with files with multiple blocks; and move and remove a file. The test takes forever if I use too many files or blocks. So, used 6 files with 2 blocks each
- Create collection and add 2 streams with 200 files each to it; move and remove one file. Similar to the above test, it takes too long if I push the numbers much higher than this.
- Create collection and add files using magic dir; move all files into another collection
- Create collection and add files using magic dir; move one file at a time into another collection
- I could not figure out how to add multiple streams to a collection when testing with magic dir. During review, please provide a hint as to how to do this. Thanks.
Radhika Chippada wrote:
I did not add test to retrieve project contents at this time.
Was there some obstacle that made this impossible or impractical at this time? The performance of this specific operation is one that we've received bug reports about (e.g., #6019), so it's one that we know for sure users have interest in. The same goes for listing files in a large collection (#5662)—I'm not sure if that's implicitly covered in one of your existing tests, but it would be good to have profiling results for that in isolation.
The branch now contains test that lists a project's contents as well.
Radhika Chippada wrote:
The branch now contains test that lists a project's contents as well.
Thanks. And listing the files in a collection with many files?
In magicDirTestMoveFiles_oneEachIntoAnother
The @profiled
annotation is around a loop which calls "pool.apply". I don't know how much overhead the Python multiprocessing adds compared to the function being called, so it would be better to have the outer loop and @profiled
annotation to be part of magicDirTest_MoveFileFromCollection
instead.
Thanks Peter for the comments. I updated the branch to profile smaller code fragments. Please take another look.
- Target version changed from 2015-07-08 sprint to 2015-07-22 sprint
- Story points changed from 2.0 to 0.5
I'm not entirely sure this answers the performance questions about FUSE that I would want to ask, but this branch has been outstanding for awhile so let's go ahead and merge.
Peter Amstutz wrote:
I'm not entirely sure this answers the performance questions about FUSE that I would want to ask, but this branch has been outstanding for awhile so let's go ahead and merge.
One of our criteria for branch reviews is, "Does the branch do what the story specifies?" So, does this branch provide profile data for the use cases listed in the description? If so, then we're golden. If not, I'd like to know more about what the discrepancies are and why they exist.
- Using FUSE for a collection with streams: 2, files_per_stream: 200, bytes_per_file: 1
create collection |
list files |
move one file |
remove one file |
45.577s |
0.995s |
0.132s |
0.111s |
- Using FUSE for a collection with streams: 2, files_per_stream: 3, blocks_per_file: 2 bytes_per_block: 2**26
create collection |
list files |
move one file |
remove one file |
46.351s |
54.671s |
0.092s |
0.100s |
- Using Magi Dir for a collection with streams: 2, files_per_stream: 200, bytes_per_file: 1
create collection |
list files |
move one file |
remove one file |
move all files |
0.58s |
0.604s |
0.210s |
0.103s |
34.935s |
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:39c75ea686e2326508fd8e3d0be31cdde7906597.
Also available in: Atom
PDF