Bug #6219

[FUSE] [Performance] Add performance tests for FUSE

Added by Brett Smith over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
FUSE
Target version:
Start date:
06/30/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

Short version: implement a separate test suite that reports timing profiles. Eventually we'll have comprehensive performance tests, but for this ticket, focus tests on operations that parse and manipulate collection objects and manifests. In particular, try to avoid talking to Keep:

  • List the contents of a project that contains many collections
  • List the contents of a large collection
  • Update one collection by copying files from another
  • (Assuming FUSE supports it) Create a collection from scratch by making a directory for it and copying files to it from an existing Collection, and saving
  • (Others? But again, the focus is on collections performance.)

When in doubt, for implementation guidance, follow the pattern for the API server and Workbench established in #6087, unless that's grossly un-Pythonic. Use the large collection test fixtures created during that development.


Subtasks

Task #6452: Review branch: 6219-fuse-performance-testing (from 3198-writable-fuse) branchResolvedPeter Amstutz

Associated revisions

Revision 39c75ea6
Added by Radhika Chippada over 6 years ago

closes #6219
Merge branch '6219-fuse-performance-testing'

History

#1 Updated by Brett Smith over 6 years ago

  • Description updated (diff)
  • Category set to FUSE

#2 Updated by Brett Smith over 6 years ago

  • Story points set to 2.0

#3 Updated by Radhika Chippada over 6 years ago

  • Subject changed from [FUSE] Add performance tests for FUSE to [FUSE] [Performance] Add performance tests for FUSE

#4 Updated by Peter Amstutz over 6 years ago

See https://arvados.org/issues/3198#note-39 and https://arvados.org/issues/3198#note-40 for some benchmarking notes around FUSE and Collections.

#5 Updated by Brett Smith over 6 years ago

  • Assigned To set to Radhika Chippada

Follow the discussion starting at https://arvados.org/issues/6218#note-5 for implementation ideas.

#6 Updated by Brett Smith over 6 years ago

It may go without saying, but writing tests for write operations is blocked by #3198. However, it should be possible to start the performance test framework with tests for read operations, in a way that's extensible to write operations as well.

#7 Updated by Radhika Chippada over 6 years ago

  • Status changed from New to In Progress

#8 Updated by Radhika Chippada over 6 years ago

Regarding branch 6219-fuse-performance-testing:

  • This branch is from 3198-writable-fuse branch
  • Includes some tests for testing collection operations
    • Create collection with files with multiple blocks; and move and remove a file. The test takes forever if I use too many files or blocks. So, used 6 files with 2 blocks each
    • Create collection and add 2 streams with 200 files each to it; move and remove one file. Similar to the above test, it takes too long if I push the numbers much higher than this.
    • Create collection and add files using magic dir; move all files into another collection
    • Create collection and add files using magic dir; move one file at a time into another collection
  • I could not figure out how to add multiple streams to a collection when testing with magic dir. During review, please provide a hint as to how to do this. Thanks.

#9 Updated by Brett Smith over 6 years ago

Radhika Chippada wrote:

I did not add test to retrieve project contents at this time.

Was there some obstacle that made this impossible or impractical at this time? The performance of this specific operation is one that we've received bug reports about (e.g., #6019), so it's one that we know for sure users have interest in. The same goes for listing files in a large collection (#5662)—I'm not sure if that's implicitly covered in one of your existing tests, but it would be good to have profiling results for that in isolation.

#10 Updated by Radhika Chippada over 6 years ago

The branch now contains test that lists a project's contents as well.

#11 Updated by Brett Smith over 6 years ago

Radhika Chippada wrote:

The branch now contains test that lists a project's contents as well.

Thanks. And listing the files in a collection with many files?

#12 Updated by Peter Amstutz over 6 years ago

In magicDirTestMoveFiles_oneEachIntoAnother

The @profiled annotation is around a loop which calls "pool.apply". I don't know how much overhead the Python multiprocessing adds compared to the function being called, so it would be better to have the outer loop and @profiled annotation to be part of magicDirTest_MoveFileFromCollection instead.

#13 Updated by Radhika Chippada over 6 years ago

Thanks Peter for the comments. I updated the branch to profile smaller code fragments. Please take another look.

#14 Updated by Brett Smith over 6 years ago

  • Target version changed from 2015-07-08 sprint to 2015-07-22 sprint

#15 Updated by Radhika Chippada over 6 years ago

  • Story points changed from 2.0 to 0.5

#16 Updated by Peter Amstutz over 6 years ago

I'm not entirely sure this answers the performance questions about FUSE that I would want to ask, but this branch has been outstanding for awhile so let's go ahead and merge.

#17 Updated by Brett Smith over 6 years ago

Peter Amstutz wrote:

I'm not entirely sure this answers the performance questions about FUSE that I would want to ask, but this branch has been outstanding for awhile so let's go ahead and merge.

One of our criteria for branch reviews is, "Does the branch do what the story specifies?" So, does this branch provide profile data for the use cases listed in the description? If so, then we're golden. If not, I'd like to know more about what the discrepancies are and why they exist.

#18 Updated by Radhika Chippada over 6 years ago

  • Using FUSE for a collection with streams: 2, files_per_stream: 200, bytes_per_file: 1
create collection list files move one file remove one file
45.577s 0.995s 0.132s 0.111s
  • Using FUSE for a collection with streams: 2, files_per_stream: 3, blocks_per_file: 2 bytes_per_block: 2**26
create collection list files move one file remove one file
46.351s 54.671s 0.092s 0.100s
  • Using Magi Dir for a collection with streams: 2, files_per_stream: 200, bytes_per_file: 1
create collection list files move one file remove one file move all files
0.58s 0.604s 0.210s 0.103s 34.935s

#19 Updated by Radhika Chippada over 6 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:39c75ea686e2326508fd8e3d0be31cdde7906597.

Also available in: Atom PDF