Bug #18941
closedarv-mount/python SDK performance should be better
Updated by Peter Amstutz almost 3 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz almost 3 years ago
- Subject changed from arv-mount performance should be better to arv-mount/python SDK performance should be better
Updated by Peter Amstutz almost 3 years ago
- Release set to 46
I'm curious how the performance improves with a local keepstore server
Goofys uses a 400 MB readahead buffer with 20 MB chunks.
Updated by Peter Amstutz almost 3 years ago
18941-arv-prefetch @ d3051d45df5ee760a05a84bf4d385799c4326477
Found bugs in the prefetch logic:
When checking to see if a block was already in cache or should be prefetched it was:
- using get_from_cache() which only returns true if the block is fully downloaded, but what we care about is whether there is an in-flight GET request or not.
- using the signed identifier, but blocks are keyed on the bare md5 sum.
So, it would always think the block was not yet in cache, issuing numerous prefetch requests for the same block over and over, creating a traffic jam where all the prefetch threads were waiting on the same block instead of skipping duplicates and fetching the next required block.
Fixes:
- Add a special "prefetch" option for KeepClient.get() which (a) starts downloading like normal if the block is not cached or (b) returns immediately if the block already has a cache entry
- This ensures the prefetch requests (a) only initiate downloads for blocks that are not yet in flight (b) does not wait for in-flight requests to complete
- Also removed the separate check to see if the block was in the cache; this could race with other threads and still result in more than one thread waiting on the same block fetch.
- the number of prefetch 'get' threads is now settable, and the prefetch window size is based on the number of prefetch threads
- arv-get has new --threads option, uses 4 threads by default
- arv-mount sets the number of prefetch threads based on the keep cache size
- cleanup: removed an unused "is_cached" method (which could not possibly have ever been used because it contains a fatal typo)
Updated by Peter Amstutz almost 3 years ago
1.7 GiB test file
Time to read whole file | |
arv-get | 0m19.980s |
arv-mount, default 256 MiB cache, local keepstore | 0m25.396s |
arv-mount, increased 384 MiB cache, local keepstore | 0m22.354s |
time arv-get --threads 4 pirca-4zz18-qwyvdfzx68qok1y/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex /dev/null
time dd if=keep/by_id/pirca-4zz18-qwyvdfzx68qok1y/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex of=/dev/nul
Reading the file includes loading and parsing the manifest and creating the Collection object.
With default settings, arv-mount is using 3 threads, but arv-get is using 4. They both use (by default) a 256 MiB cache, the difference is that arv-get is guaranteed to be doing a sequential read, whereas arv-mount is being slightly more conservative to account for other read patterns. We should consider adjusting the clusterwide default value "Container.DefaultKeepCacheRAM: 268435456" (256 Mib) up to 402653184 (384 MiB).
Ward observed that there seems to be a cache warming effect on the S3 side. I'm seeing this too. Reading a file, unmounting it, re-mounting it, then reading it again, the measured performance goes from 130 MB/s (first time) to 180 MB/s (subsequent trials).
Updated by Peter Amstutz almost 3 years ago
- Target version changed from 2022-03-30 Sprint to 2022-04-13 Sprint
Updated by Peter Amstutz almost 3 years ago
I made one small change increasing max threads from 6 to 7 0a9d4fa5043bd7291611c41588bdd3f0b70ede44
Updated by Lucas Di Pentima almost 3 years ago
Some comments & questions:
- "
arv keep get
" documentation may need updating atsdk/cli/subcommands.html
to reflect the new parameter. - I think it would be good to have proper testing for this fix, not having it allowed this bug to exist for so long. OTOH if we need to ship this ASAP, it could be done in a separate branch, wdyt?
- As talked at spring review, I think using the CPU core number as a factor for capping the max number of threads would be convenient. Maybe doing some manual tests on a big node to validate this would be desirable? If you have already done this test and it just works on
arv-get
, how about computing the default get threads depending on the cpu cores?
The rest LGTM
Updated by Peter Amstutz almost 3 years ago
Lucas Di Pentima wrote:
Some comments & questions:
- "
arv keep get
" documentation may need updating atsdk/cli/subcommands.html
to reflect the new parameter.
Will do.
- I think it would be good to have proper testing for this fix, not having it allowed this bug to exist for so long. OTOH if we need to ship this ASAP, it could be done in a separate branch, wdyt?
Yes, we need to ship it.
It's very hard to write a test for. There was already a test that the very basic prefetch behavior was happening, the problem is KeepClient.get() was stubbed out. Since the core of the issue was multiple threads waiting on KeepClient.get() of the same block, the use of a stub would have still hidden the problem. I'll have to think about it.
- As talked at spring review, I think using the CPU core number as a factor for capping the max number of threads would be convenient. Maybe doing some manual tests on a big node to validate this would be desirable? If you have already done this test and it just works on
arv-get
, how about computing the default get threads depending on the cpu cores?
Right now I don't have enough information. My theory is that because of the Python global interpreter lock and the additional work that the FUSE driver does (compared to arv-get), twice as many cores would have twice the lock contention, which would be worse, not better. I don't want to leave it uncapped or use a formula for which I don't have any evidence that it'll actually make it better.
The rest LGTM
Thanks.
Updated by Peter Amstutz almost 3 years ago
Benchmark using dd if=(25 GB keep file) of=/dev/null
2.3.3
https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-amqvijqbdoszk83#Log
2022-03-30T20:47:17.361472192Z 26691600151 bytes (27 GB, 25 GiB) copied, 666.32 s, 40.1 MB/s
2.4.0~rc1
On a 2 core node:
https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-a99km88q375xpiy#Log
2022-03-30T21:02:42.039596608Z 26691600151 bytes (27 GB, 25 GiB) copied, 222.505 s, 120 MB/s
On a 4 core node:
https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-yx8wmx0qddaqxmq#Log
2022-03-30T21:06:55.824180944Z 26691600151 bytes (27 GB, 25 GiB) copied, 152.218 s, 175 MB/s
On a 8 core node:
https://workbench.pirca.arvadosapi.com/container_requests/pirca-xvhdp-30vfuzz3x60j3if#Log
2022-03-30T21:12:09.440730978Z 26691600151 bytes (27 GB, 25 GiB) copied, 137.416 s, 194 MB/s
Updated by Peter Amstutz almost 3 years ago
- Related to Bug #18964: Write better prefetch tests added
Updated by Peter Amstutz almost 3 years ago
Added separate story for testing #18964
Updated by Peter Amstutz almost 3 years ago
- Status changed from In Progress to Resolved