Idea #7780
closed[FUSE] Write a FUSE performance pipeline
Description
Write a pipeline that copies data (5-10GB) from the FUSE mount to a local temporary file. Test each of the following methods in sequence in the pipeline:
- shell 'cp' from mount
- Python shutil.copy from mount
- arv-get
- A Python loop that copies from a CollectionReader() to an open file
Write a message after each copy is done so it's possible to collect timing stats up to that point. Then do a checksum on the file and verify that it's not corrupt.
Updated by Brett Smith about 9 years ago
- Description updated (diff)
- Category set to FUSE
Updated by Peter Amstutz about 9 years ago
1st run
arv-get: 54.547s
collection: 39.7725520134
cp: 39.635s
shutil: 37.7655129433
Updated by Peter Amstutz about 9 years ago
2nd run
arv-get: 1m41.320s
collection: 38.4127969742
cp: 40.311s
shutil: 37.6932270527
3rd & 4th run
arv-get: 40.937s
arv-get: 41.551s
Updated by Peter Amstutz about 9 years ago
Seems to be averaging 45 MiB/s
First one (arv-get) seems to be affected by cache freshness issues, running it last gives better numbers.
cpu utilization:
arv-get: cpu 10.5500 user 10.1400 sys
collection: cpu 12.5100 user 10.5600 sys
cp: cpu 0.0200 user 1.7600 sys
shutil: cpu 2.5400 user 2.3600 sys
I think this just means arv-mount's CPU utilization isn't being charged to the copy commands like it is to arv-get and CollectionReader.
Updated by Peter Amstutz about 9 years ago
strace -c arv-get 0b2b39e4da4b51d40dcfc12f47af4e04+1249/lobSTR_ref.fasta . % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 31.94 1.363753 2 808788 clock_gettime 26.46 1.129736 10 115231 recvfrom 23.16 0.988879 275 3601 write 9.10 0.388444 2 230682 poll
Not sure why it's calling clock_gettime
800k times. Also recvfrom() is only reading 16k at a time which seems too small for the data transfer rate we're trying to achieve.
Updated by Ward Vandewege about 9 years ago
Peter Amstutz wrote:
[...]
Not sure why it's calling
clock_gettime
800k times.
Known issue, you need to set an env var. See
http://serverfault.com/questions/452833/how-to-resolve-high-cpu-excessive-stat-etc-localtime-and-clock-gettimeclo
Updated by Ward Vandewege about 9 years ago
Ward Vandewege wrote:
Peter Amstutz wrote:
[...]
Not sure why it's calling
clock_gettime
800k times.Known issue, you need to set an env var. See
Hmm, maybe I spoke too soon. We set TZ to avoid the excessive stat'ing of /etc/localtime in some places on our clusters. But it doesn't seem to help with all the clock_gettime calls.
Updated by Peter Amstutz about 9 years ago
The inner receiver loop (which is presumably all libcurl or pycurl) looks like this:
poll([{fd=11, events=POLLIN}], 1, 1000) = 1 ([{fd=11, revents=POLLIN}]) clock_gettime(CLOCK_MONOTONIC, {4137158, 906951004}) = 0 poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}]) recvfrom(11, "TTTTCAGGGTTTCACTCTGTCACCCAGGCTGG"..., 16384, 0, NULL, NULL) = 16384 clock_gettime(CLOCK_MONOTONIC, {4137158, 907006204}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907018404}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907030204}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907042004}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907053504}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907065504}) = 0 poll([{fd=11, events=POLLIN}], 1, 1000) = 1 ([{fd=11, revents=POLLIN}]) clock_gettime(CLOCK_MONOTONIC, {4137158, 907091504}) = 0 poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}]) recvfrom(11, "TTCTCAACATTTCGCCTAGCAACTCCCCCAGG"..., 16384, 0, NULL, NULL) = 16384 clock_gettime(CLOCK_MONOTONIC, {4137158, 907144704}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907157004}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907168804}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907192504}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907204004}) = 0 clock_gettime(CLOCK_MONOTONIC, {4137158, 907215904}) = 0 poll([{fd=11, events=POLLIN}], 1, 1000) = 1 ([{fd=11, revents=POLLIN}])
Updated by Nico César about 9 years ago
https://workbench.c97qk.arvadosapi.com/pipeline_instances/c97qk-d1hrv-ary5ydq9enxrs6s#
created a 2nd keep server and testing now
Updated by Nico César about 9 years ago
I ran this job and works
https://workbench.c97qk.arvadosapi.com/pipeline_instances/c97qk-d1hrv-ary5ydq9enxrs6s#
c97qk (azure blob store, 2 keepstore servers): throwaway 149.83417511 arv-get 106.32137084 CollectionReader 104.115087986 cp 105.585179806 shutil 96.5133240223
LGTM for what is worth.
As a whishlist I'd like to add more data to the final table so we have the possibility to compare over time like:
about the run itself¶
- bytes sent
- cpu (us sy id wa st if possible)
- highest usage of memory (whatever is the easiest way to measure this.)
- io activity (whatever is the easiest way to measure this.)
- latency to the rest of the components (like an average of 5 pings to API, keep servers, etc)
about the environment¶
- keep server/services available
- version of arvados components
- /proc/cpu_info and lshw or something similar to gives us a clue which cloud and size we're dealing with (ideally API server should provide all this details including the size of all the components)
- keep block size
there are more to add but I'll be happy if just start with that and collect the info over time
Updated by Peter Amstutz about 9 years ago
"throwaway" task
4xphq 55.6503009796, 56.6268150806, 58.0175321102
c97qk 106.220340014, 85.2707979679, 106.330720901, 113.038635015
qr1hi 38.397521019 101s
Updated by Peter Amstutz about 9 years ago
4xphq
warmup: 56.9406909943
arvget: 56.6816000938
qr1hi:
warmup: 434.597276211
arvget: 40.413373947