Project

General

Profile

Actions

Bug #10813

closed

Improve performance of arv-put

Added by Tom Morris over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0

Description

Uploading BCL files using arv-put only achieves 5-10 MB/s while using 35% CPU. This is to slow on transfer and too high on CPU usage. It also appears that performance consistently drifts down over the course of an upload, indicating, perhaps, an issue with processing large manifests.

Here's a little ASCII art graphic from bmon:

     MiB                       (RX Bytes/hour)                                           MiB                       (TX Bytes/hour)     
   48.40 ....................................|||.....................                   8.04 .............................|||............................     
   40.33 ....................................|||.....................                   6.70 .................|||||||||||||||............................     
   32.26 ...................................||||.....................                   5.36 .....|||||||||||||||||||||||||||.........|||||..............     
   24.20 |..................................||||.....................                   4.02 .....|||||||||||||||||||||||||||.........|||||..............     
   16.13 |..................................||||.....................                   2.68 ....|||||||||||||||||||||||||||||.......||||||..............     
    8.07 |::::::::::::::::::::::::::::::::::||||:::::::..............                   1.34 ::::|||||||||||||||||||||||||||||:::::::||||||..............     
         1   5   10   15   20   25   30   35   40   45   50   55   60                        1   5   10   15   20   25   30   35   40   45   50   55   60     

The 50 MB/s download in hours 39-36 is from Azure blob storage to a local shell node using the blobxfer utility. The arv-put bandwidth starts at ~8 MB/s in hour 33 and drifts down to ~5 MB/s in hour 6, averaging 6 MB/s for the entire 30 hours.

The performance goal is at least a 4x improvement to 25 MB/s, but achieving parity with blobxfer (~50 MB/s) would be even better.

Here's a detailed bandwidth graph of what a blobxfer transfer looks like:

    MiB                      (RX Bytes/second)                                          KiB                      (TX Bytes/second)
   97.84 ..|...........................||....|...............|.......                 133.63 .........................|..................................
   81.54 ..||....||...||....||...|||...||....||...||....||...|||...||                 111.36 .........................|..........||......................
   65.23 ..|||..|||...|||..|||...|||...|||..|||...||...|||...|||...||                  89.09 ..||...|||...|||...||...|||...||....||....|....|....|||....|
   48.92 ..|||..|||...|||..|||...|||...|||..|||...|||..|||...|||..|||                  66.82 ..|||..|||...|||..|||...|||...|||..|||...||...|||...|||...||
   32.61 ..|||..|||..||||..||||..|||..||||..|||..||||..||||..|||..|||                  44.54 ..|||..|||...|||..||||..|||...|||..|||...|||..||||..|||..|||
   16.31 :||||||||||:||||::||||:|||||:||||::||||:|||||:||||::|||::|||                  22.27 ::|||||||||:||||::||||||||||:||||::||||:|||||:||||::|||::|||
         1   5   10   15   20   25   30   35   40   45   50   55   60                        1   5   10   15   20   25   30   35   40   45   50   55   60

Here's the corresponding arv-put graph

     KiB                      (RX Bytes/second)                                          MiB                      (TX Bytes/second)
 1220.81 .......................................................|....                  25.25 .........................|......|...........................
 1017.34 ..........................|............................|....                  21.05 ...|.....................|.|....|.|....|.........|.....|....
  813.87 ...........|............|.|.............|.|............|....                  16.84 ...|..........|....|.....|.|....|.|....|.|.......|.....|....
  610.41 ...|...|..||............|||.....|.......|.|............|..|.                  12.63 ..||||.....|||||...|||...|||....|||....||||......|||...|.|..
  406.94 ..||...||.|||||....|....||||....|.|....||||......|.|...|..|.                   8.42 ..||||.....|||||...|||..|||||...||||...||||....|||||..||||.|
  203.47 :||||||||:||||||:::|||::|||||::|||||:::||||:|::|||||::||||||                   4.21 :||||||::::||||||::||||:|||||::|||||::|||||||:||||||::||||||
         1   5   10   15   20   25   30   35   40   45   50   55   60                        1   5   10   15   20   25   30   35   40   45   50   55   60

All of the Illumina sequencer outputs are pretty similar: ~600 GB in ~242,000 files, the bulk of which are ~238,000 gzipped BCL files that range in size from 2 MB to 4 MB with the following size distribution:

 202477 3 MB
  33461 4 MB
   2141 2 MB
      1 1 MB

The files are grouped in directories of about 300 MB each, like this:

79058    Data/Intensities/BaseCalls/L005
310    Data/Intensities/BaseCalls/L005/C309.1
310    Data/Intensities/BaseCalls/L005/C308.1

The blobxfer utility uses 6 worker threads by default and it looks from the gaps in the bandwidth graph like that's not sufficient to cover all the latency with these small files sizes, but arv-put is doing much worse.


Files

arv-put perf.ods (33.2 KB) arv-put perf.ods Lucas Di Pentima, 01/13/2017 07:27 PM

Subtasks 2 (0 open2 closed)

Task #11008: Review 10813-arv-put-six-threadsResolvedPeter Amstutz01/30/2017Actions
Task #10818: Review 10813-arv-put-six-threadsResolvedPeter Amstutz01/04/2017Actions
Actions

Also available in: Atom PDF