Bug #10813

Updated by Tom Morris almost 5 years ago

Uploading BCL files using arv-put only achieves 5-10 MB/s while using 35% CPU. This is to slow on transfer and too high on CPU usage. It also appears that performance consistently drifts down over the course of an upload, indicating, perhaps, an issue with processing large manifests.

Here's a little ASCII art graphic from bmon:

<pre>
MiB (RX Bytes/hour) MiB (TX Bytes/hour)
48.40 ....................................|||..................... 8.04 .............................|||............................
40.33 ....................................|||..................... 6.70 .................|||||||||||||||............................
32.26 ...................................||||..................... 5.36 .....|||||||||||||||||||||||||||.........|||||..............
24.20 |..................................||||..................... 4.02 .....|||||||||||||||||||||||||||.........|||||..............
16.13 |..................................||||..................... 2.68 ....|||||||||||||||||||||||||||||.......||||||..............
8.07 |::::::::::::::::::::::::::::::::::||||:::::::.............. 1.34 ::::|||||||||||||||||||||||||||||:::::::||||||..............
1 5 10 15 20 25 30 35 40 45 50 55 60 1 5 10 15 20 25 30 35 40 45 50 55 60
</pre>

The 50 MB/s download in hours 39-36 is from Azure blob storage to a local shell node using the blobxfer utility. The arv-put bandwidth starts at ~8 MB/s in hour 33 and drifts down to ~5 MB/s in hour 6, averaging 6 MB/s for the entire 30 hours.

The performance goal is at least a 4x improvement to 25 MB/s, but achieving parity with blobxfer (~50 MB/s) would be even better.

Here's a detailed bandwidth graph of what a blobxfer transfer looks like:

<pre>
MiB (RX Bytes/second) KiB (TX Bytes/second)
97.84 ..|...........................||....|...............|....... 133.63 .........................|..................................
81.54 ..||....||...||....||...|||...||....||...||....||...|||...|| 111.36 .........................|..........||......................
65.23 ..|||..|||...|||..|||...|||...|||..|||...||...|||...|||...|| 89.09 ..||...|||...|||...||...|||...||....||....|....|....|||....|
48.92 ..|||..|||...|||..|||...|||...|||..|||...|||..|||...|||..||| 66.82 ..|||..|||...|||..|||...|||...|||..|||...||...|||...|||...||
32.61 ..|||..|||..||||..||||..|||..||||..|||..||||..||||..|||..||| 44.54 ..|||..|||...|||..||||..|||...|||..|||...|||..||||..|||..|||
16.31 :||||||||||:||||::||||:|||||:||||::||||:|||||:||||::|||::||| 22.27 ::|||||||||:||||::||||||||||:||||::||||:|||||:||||::|||::|||
1 5 10 15 20 25 30 35 40 45 50 55 60 1 5 10 15 20 25 30 35 40 45 50 55 60
</pre>

Here's I'll add the corresponding arv-put graph

<pre>
KiB (RX Bytes/second) MiB (TX Bytes/second)
1220.81 .......................................................|.... 25.25 .........................|......|...........................
1017.34 ..........................|............................|.... 21.05 ...|.....................|.|....|.|....|.........|.....|....
813.87 ...........|............|.|.............|.|............|.... 16.84 ...|..........|....|.....|.|....|.|....|.|.......|.....|....
610.41 ...|...|..||............|||.....|.......|.|............|..|. 12.63 ..||||.....|||||...|||...|||....|||....||||......|||...|.|..
406.94 ..||...||.|||||....|....||||....|.|....||||......|.|...|..|. 8.42 ..||||.....|||||...|||..|||||...||||...||||....|||||..||||.|
203.47 :||||||||:||||||:::|||::|||||::|||||:::||||:|::|||||::|||||| 4.21 :||||||::::||||||::||||:|||||::|||||::|||||||:||||||::||||||
1 5 10 15 20 25 30 35 40 45 50 55 60 1 5 10 15 20 25 30 35 40 45 50 55 60

</pre>

All of the Illumina sequencer outputs are pretty similar: ~600 GB in ~242,000 files, the bulk of which are ~238,000 gzipped BCL files that range in size from 2 MB to 4 MB with the following size distribution:

<pre>
202477 3 MB
33461 4 MB
2141 2 MB
1 1 MB
</pre>

The files are grouped in directories of about 300 MB each, like this:

<pre>
79058 Data/Intensities/BaseCalls/L005
310 Data/Intensities/BaseCalls/L005/C309.1
310 Data/Intensities/BaseCalls/L005/C308.1
</pre>

The blobxfer utility uses 6 worker threads by default and it looks from the gaps in the bandwidth graph like that's not sufficient to cover all the latency with these small files sizes, but arv-put is doing much worse.
later tonight.

Back