Bug #10315

Performance issues on arv-put command version from '9701-collection-pack-small-files-alt' branch

Added by Lucas Di Pentima almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
10/20/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

It seems that the new arv-put command has problems when asked to upload lots of files.


Subtasks

Task #10316: Review 10315-new-arv-put-performanceResolvedLucas Di Pentima

Associated revisions

Revision 15d8499a
Added by Lucas Di Pentima almost 3 years ago

Merge branch '10315-new-arv-put-performance'
Closes #10315

History

#1 Updated by Lucas Di Pentima almost 3 years ago

Some tests:

  • Test 1: File size = 1 byte - Arvbox local keep store - Cache update interval 1s
    • 100 files: 0m0.473s
    • 1000 files: 0m1.843s
    • 10000 files: 3m7.695s (!!!!)
  • Test 2: File size 1 Kbyte - Arvbox local keep store - Cache update interval 1s
    • 100 files: 0m0.492s
    • 1000 files: 0m1.871s
    • 10000 files: 3m19.292s (seems that file size is not important)
  • Test 3: File size 1 Kbyte - Arvbox local keep store - Cache update interval 60s
    • 10000 files: 2m45.130s (seems that the issue is not about updating an increasingly bigger cache file)

#2 Updated by Lucas Di Pentima almost 3 years ago

Found what the problem was: For every small file close() call, the BlockManager.repack_small_files() method is called, and the first thing that used to do is a list comprehension searching for all pending buffer blocks on its internal list, and then adding the size() of every one of them to check if there're enough to fill an entire block.

This list comprehension execution was the slow operation, when the number of pending buffer blocks was > 1000, it was starting to add up lots of time.

The solution I came up with is to add an internal variable to the BlockManager class to keep count of the sum of all the pending buffer block's sizes, this counter get added on every small file close() call and the slow list comprehension is executed only when the counter gets to at least the keep block size.

My local tests came with around 10X speed improvement when testing with 5000 files of 1 KB each. Improvement ratios should be a lot higher when tests are performed with even more files.

These changes are at the 10315-new-arv-put-performance branch at: 6f04833

Test suite running at: https://ci.curoverse.com/job/developer-run-tests/47/

#3 Updated by Lucas Di Pentima almost 3 years ago

Found a pre-existing locking bug (already there at 9701 branch): When having enough small blocks to fill an entire 64MB block, it gets locked when calling commit_bufferblock() with sync=False because that method calls start_put_threads() and that's a @synchronized method.
Thinking about wrapping a non-synchronized version of it.

#4 Updated by Lucas Di Pentima almost 3 years ago

Commit: 078bdb1

Solved the locking issue by moving the start_put_threads() call into BlockManager's __init__()

New test suite run at: https://ci.curoverse.com/job/developer-run-tests/48/

#5 Updated by Lucas Di Pentima almost 3 years ago

Additional stats data: 100000 files of 1KB each: 2 minutes 14 secs at a local arvbox installation.

#6 Updated by Lucas Di Pentima almost 3 years ago

  • Target version changed from 2016-10-26 sprint to 2016-11-09 sprint

#7 Updated by Peter Amstutz almost 3 years ago

Reviewing 10315-new-arv-put-performance @ 078bdb166766423e1a423523e5285966aff7ec6b

The main reason for lazy start of the put threads is to avoid creating put threads for read-only collections. Could you have CollectionReader() pass in a flag that skips starting the put threads?

Caching _pending_write_size is a good idea. However, would it make sense to recompute pending_write_size after collecting small_blocks and then check it again? (For example, a file could be opened, written, and closed multiple times, which may cause it to be double-counted in this implementation).

#8 Updated by Lucas Di Pentima almost 3 years ago

Updates at 95fc987

  • About put threads lazy start: I set up an exclusive lock for this operation, and leave the start call where it was previously, at commit_bufferblock() method.
  • Updated pending write size count as requested.

Test suite being run at: https://ci.curoverse.com/job/developer-run-tests/49/

Thanks!

#9 Updated by Peter Amstutz almost 3 years ago

Lucas Di Pentima wrote:

Updates at 95fc987

  • About put threads lazy start: I set up an exclusive lock for this operation, and leave the start call where it was previously, at commit_bufferblock() method.

Looks good.

  • Updated pending write size count as requested.

Suggested tweak:

            self._pending_write_size = sum([b.size() for b in small_blocks])
            if self._pending_write_size < config.KEEP_BLOCK_SIZE and not force:
                return

#10 Updated by Lucas Di Pentima almost 3 years ago

Branch updated with the suggested change and master branch merged in at 734335da27f27e2177d3b931b1e5e9e8e83a042f
Tests successfully run: https://ci.curoverse.com/job/developer-run-tests/51/

#11 Updated by Peter Amstutz almost 3 years ago

Lucas Di Pentima wrote:

Branch updated with the suggested change and master branch merged in at 734335da27f27e2177d3b931b1e5e9e8e83a042f
Tests successfully run: https://ci.curoverse.com/job/developer-run-tests/51/

Thanks, LGTM, please merge.

#12 Updated by Lucas Di Pentima almost 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:15d8499a8d95abbc4bc2dbbd0bcfd2a4c6666408.

Also available in: Atom PDF