Feature #21606
closedconfigurable keep-web output buffer to reduce delay between blocks
Description
According to #18961, now that #2960 has reduced the TTFB for fetching a block, predicting and pre-fetching the next block appears to be more complex than it's worth.
Instead, in a typical scenario where the backend (keepstore→keep-web) bandwidth is faster than the frontend (keep-web→client), keep-web can reduce or eliminate the between-block delay by writing to an asynchronous output buffer. While keep-web is waiting a few milliseconds for the next block to start arriving from the backend, the client continues to receive the data that has accumulated in the output buffer.
The size of the output buffer should be configurable.
Files
Updated by Tom Clegg 9 months ago
- Related to Feature #18961: Go FileSystem / FUSE mount supports block prefetch added
Updated by Tom Clegg 9 months ago
- Related to Idea #18342: Keep performance optimization added
Updated by Tom Clegg 9 months ago
- File buffering.png buffering.png added
- File buffering-after-warmup.png buffering-after-warmup.png added
There's a noticeable "warm-up" effect on the keepstore/s3 side: if I wait a few minutes between sets of trials, the first few trials tend to go something like {42.3, 77.9, 116, 119, ...} and then stay somewhat consistent after that.
The left chart includes all of the trials. The right chart drops the first 3 data points from each set of trials, to (mostly?) isolate the backend warm-up effect.
- main: main branch, no buffering code
- 0: 902598263d with
WebDAVOutputBuffer: 0
- 1M: 902598263d with
WebDAVOutputBuffer: 1M
- 4M: 902598263d with
WebDAVOutputBuffer: 4M
- 16M: 902598263d with
WebDAVOutputBuffer: 16M
- The difference between "main" and "0" looks non-zero but I am pretty sure this is just an indication of how much variation between 15-sample trials is entirely attributable to cloud weather.
- These results doesn't look especially compelling. However, unlike #18961, this feature (a) is easy to tune or disable via config, (b) does not add much complexity (it's very modular), and (c) is not susceptible to the "slow down first block by starting second block too soon" effect. Also, tordo might just not have a performance profile that makes this feature valuable. I could see merging it (perhaps with a smaller/zero default) so we/others can do experiments in different environments.
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Updated by Tom Clegg 9 months ago
- File buffering-after-warmup-c5large-2.png buffering-after-warmup-c5large-2.png added
- File buffering-after-warmup-c5large.png buffering-after-warmup-c5large.png added
repeated tests 2x with keep-web node upgraded to a c5.large:
Updated by Tom Clegg 9 months ago
After trying and failing to convince myself that a non-zero output buffer can ever improve download speed on an AWS installation, I'm inclined to change the default buffer size to 0 and merge this. That way we/others will be able to experiment with it in other deployment scenarios.
Updated by Tom Clegg 9 months ago
21606-keep-web-output-buffer @ 6106db019b1385e3091b8a01e88bcd702ec6ef9f -- developer-run-tests: #4132
Updated by Tom Clegg 9 months ago
21606-keep-web-output-buffer @ e38a4d340e23f967d2a8361aba3c2a432c0fe321 -- developer-run-tests: #4164
- All agreed upon points are implemented / addressed.
- ✅
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described
- ✅ see test results above
- Documentation has been updated.
- ✅ just the comments in the config template -- should this be mentioned on the keep-web install page / anywhere else?
- Behaves appropriately at the intended scale (describe intended scale).
- n/a
- Considered backwards and forwards compatibility issues between client and server.
- n/a
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Brett Smith 8 months ago
Tom Clegg wrote in #note-10:
21606-keep-web-output-buffer @ e38a4d340e23f967d2a8361aba3c2a432c0fe321 -- developer-run-tests: #4164
Just a couple small things:
- The comment on WebDAVOutputBuffer leaves me wondering what the unit is. Can that be added?
- The signature of
slowCopy
feels backwards to me (it's literally the opposite of thecp
command). If this order is common in the Go stdlib, or I guess even in the Arvados GoSDK, please leave it. If not, consider swappingsrc
anddst
?
- Documentation has been updated.
- ✅ just the comments in the config template -- should this be mentioned on the keep-web install page / anywhere else?
My personal opinion on this is: if there are cases where we know it should be increased, then yes, that sounds good. If we can't specifically identify any such cases, then no. When I'm doing ops, documentation along the lines of "You might want to change this value" without any context for when I should change it, or to what, drive me batty and feel unhelpful. It feels like it's there more for the developers to fob off users than to actually be helpful.
Thanks.
Updated by Tom Clegg 8 months ago
Brett Smith wrote in #note-12:
- The comment on WebDAVOutputBuffer leaves me wondering what the unit is. Can that be added?
Added:
#
# Size be specified as a number of bytes ("0") or with units
# ("128KiB", "1 MB").
- The signature of
slowCopy
feels backwards to me (it's literally the opposite of thecp
command). If this order is common in the Go stdlib, [...]
Yes, this is a Go thing -- the builtin copy() and stdlib io.Copy() both use (dst, src).
My personal opinion on this is: if there are cases where we know it should be increased, then yes, that sounds good. If we can't specifically identify any such cases, then no. When I'm doing ops, documentation along the lines of "You might want to change this value" without any context for when I should change it, or to what, drive me batty and feel unhelpful. It feels like it's there more for the developers to fob off users than to actually be helpful.
Agreed. Leaving it out.
Updated by Tom Clegg 8 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|086fdccf436bb68d38f482c4e1418c1290ed7c0c.