Project

General

Profile

Actions

Feature #21606

closed

configurable keep-web output buffer to reduce delay between blocks

Added by Tom Clegg 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

According to #18961, now that #2960 has reduced the TTFB for fetching a block, predicting and pre-fetching the next block appears to be more complex than it's worth.

Instead, in a typical scenario where the backend (keepstore→keep-web) bandwidth is faster than the frontend (keep-web→client), keep-web can reduce or eliminate the between-block delay by writing to an asynchronous output buffer. While keep-web is waiting a few milliseconds for the next block to start arriving from the backend, the client continues to receive the data that has accumulated in the output buffer.

The size of the output buffer should be configurable.


Files

buffering.png (17.8 KB) buffering.png Tom Clegg, 03/21/2024 05:49 PM
buffering-after-warmup.png (19 KB) buffering-after-warmup.png Tom Clegg, 03/21/2024 05:49 PM
buffering-after-warmup-c5large.png (16.6 KB) buffering-after-warmup-c5large.png Tom Clegg, 03/27/2024 02:15 AM
buffering-after-warmup-c5large-2.png (20.4 KB) buffering-after-warmup-c5large-2.png Tom Clegg, 03/27/2024 03:35 PM

Subtasks 1 (1 open0 closed)

Task #20437: Review 21606-keep-web-output-bufferIn ProgressTom Clegg04/19/2024Actions

Related issues

Related to Arvados - Feature #18961: Go FileSystem / FUSE mount supports block prefetchClosedTom CleggActions
Related to Arvados Epics - Idea #18342: Keep performance optimizationNew08/01/202312/31/2024Actions
Actions #1

Updated by Tom Clegg 8 months ago

  • Related to Feature #18961: Go FileSystem / FUSE mount supports block prefetch added
Actions #3

Updated by Tom Clegg 8 months ago

  • Related to Idea #18342: Keep performance optimization added
Actions #4

Updated by Tom Clegg 8 months ago

There's a noticeable "warm-up" effect on the keepstore/s3 side: if I wait a few minutes between sets of trials, the first few trials tend to go something like {42.3, 77.9, 116, 119, ...} and then stay somewhat consistent after that.

The left chart includes all of the trials. The right chart drops the first 3 data points from each set of trials, to (mostly?) isolate the backend warm-up effect.

  • main: main branch, no buffering code
  • 0: 902598263d with WebDAVOutputBuffer: 0
  • 1M: 902598263d with WebDAVOutputBuffer: 1M
  • 4M: 902598263d with WebDAVOutputBuffer: 4M
  • 16M: 902598263d with WebDAVOutputBuffer: 16M
Some observations:
  • The difference between "main" and "0" looks non-zero but I am pretty sure this is just an indication of how much variation between 15-sample trials is entirely attributable to cloud weather.
  • These results doesn't look especially compelling. However, unlike #18961, this feature (a) is easy to tune or disable via config, (b) does not add much complexity (it's very modular), and (c) is not susceptible to the "slow down first block by starting second block too soon" effect. Also, tordo might just not have a performance profile that makes this feature valuable. I could see merging it (perhaps with a smaller/zero default) so we/others can do experiments in different environments.
Actions #5

Updated by Tom Clegg 8 months ago

above tests were done with keep0.tordo on a c5.large and workbench.tordo (keep-web) on a t3a.medium

Actions #6

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Actions #7

Updated by Tom Clegg 8 months ago

repeated tests 2x with keep-web node upgraded to a c5.large:

Actions #8

Updated by Tom Clegg 8 months ago

After trying and failing to convince myself that a non-zero output buffer can ever improve download speed on an AWS installation, I'm inclined to change the default buffer size to 0 and merge this. That way we/others will be able to experiment with it in other deployment scenarios.

Actions #10

Updated by Tom Clegg 7 months ago

21606-keep-web-output-buffer @ e38a4d340e23f967d2a8361aba3c2a432c0fe321 -- developer-run-tests: #4164

  • All agreed upon points are implemented / addressed.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • ✅ see test results above
  • Documentation has been updated.
    • ✅ just the comments in the config template -- should this be mentioned on the keep-web install page / anywhere else?
  • Behaves appropriately at the intended scale (describe intended scale).
    • n/a
  • Considered backwards and forwards compatibility issues between client and server.
    • n/a
  • Follows our coding standards and GUI style guidelines.
Actions #11

Updated by Tom Clegg 7 months ago

  • Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Actions #12

Updated by Brett Smith 7 months ago

Tom Clegg wrote in #note-10:

21606-keep-web-output-buffer @ e38a4d340e23f967d2a8361aba3c2a432c0fe321 -- developer-run-tests: #4164

Just a couple small things:

  • The comment on WebDAVOutputBuffer leaves me wondering what the unit is. Can that be added?
  • The signature of slowCopy feels backwards to me (it's literally the opposite of the cp command). If this order is common in the Go stdlib, or I guess even in the Arvados GoSDK, please leave it. If not, consider swapping src and dst?
  • Documentation has been updated.
    • ✅ just the comments in the config template -- should this be mentioned on the keep-web install page / anywhere else?

My personal opinion on this is: if there are cases where we know it should be increased, then yes, that sounds good. If we can't specifically identify any such cases, then no. When I'm doing ops, documentation along the lines of "You might want to change this value" without any context for when I should change it, or to what, drive me batty and feel unhelpful. It feels like it's there more for the developers to fob off users than to actually be helpful.

Thanks.

Actions #13

Updated by Tom Clegg 7 months ago

Brett Smith wrote in #note-12:

  • The comment on WebDAVOutputBuffer leaves me wondering what the unit is. Can that be added?

Added:

      #
      # Size be specified as a number of bytes ("0") or with units
      # ("128KiB", "1 MB").
  • The signature of slowCopy feels backwards to me (it's literally the opposite of the cp command). If this order is common in the Go stdlib, [...]

Yes, this is a Go thing -- the builtin copy() and stdlib io.Copy() both use (dst, src).

My personal opinion on this is: if there are cases where we know it should be increased, then yes, that sounds good. If we can't specifically identify any such cases, then no. When I'm doing ops, documentation along the lines of "You might want to change this value" without any context for when I should change it, or to what, drive me batty and feel unhelpful. It feels like it's there more for the developers to fob off users than to actually be helpful.

Agreed. Leaving it out.

Actions #14

Updated by Tom Clegg 7 months ago

  • Status changed from In Progress to Resolved
Actions #15

Updated by Peter Amstutz 7 months ago

  • Release set to 70
Actions

Also available in: Atom PDF