Project

General

Profile

Actions

Feature #2960

open

Keepstore can stream GET and PUT requests using keep-gateway API

Added by Tim Pierce almost 10 years ago. Updated 1 day ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-

Subtasks 2 (1 open1 closed)

Task #21515: Review 2960-keepstore-streamingResolvedPeter Amstutz02/16/2024Actions
Task #21567: Manually verify all features (read, write, index, pull, trash) appear to be working on tordoIn ProgressTom Clegg03/26/2024Actions

Related issues

Related to Arvados - Idea #9760: Remove intermediary checksum from Keep store and use cut through block forwardingRejected08/08/2016Actions
Related to Arvados - Feature #10541: [Keep] Share buffers between overlapping/consecutive GET requests for the same blockNewActions
Related to Arvados Epics - Idea #18342: Keep performance optimizationNew08/01/202305/30/2024Actions
Blocked by Arvados - Feature #21197: refactor storage drivers to support keep-gateway APIClosedTom Clegg02/13/2024Actions
Actions #1

Updated by Tim Pierce almost 10 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from Keep streams GET and PUT requests to Keep can stream GET and PUT requests

Current implementation of Keep @ a4378cd handles both GET and PUT requests by reading an entire blob (up to 64MB) into memory, performing checksums and any other policy checks, and then delivering the block appropriately. With many concurrent clients, this can bring Keep memory usage up to 500MB and more.

A more memory-efficient solution, and lower latency from the client's point of view, would be to stream data between the client and disk as much as possible with smaller buffers. The keepclient package has some useful tools for this.

Some issues to attend to:
  • When --serialize-io is enabled, Keep must continue to read an entire block and buffer it, or else a single slow client may block other clients while they wait for disk access.
  • GET cannot issue a 402 Corruption error without hashing the block first. Possible solutions include:
    • Read the block twice through a small buffer: once to calculate the checksum, and again to stream it to the client.
    • Deliver any corruption status in the second part of a multipart/mixed message.
    • Abandon any attempt to report corruption to the client and simply log it to the data manager, on the grounds that the client is already responsible for checking for corrupt blocks anyway.
  • PUT checking for collisions, which currently relies on GetBlock, will need to be rethought to handle streaming data.
Actions #2

Updated by Tim Pierce almost 10 years ago

  • Category set to Keep
Actions #3

Updated by Ward Vandewege almost 4 years ago

  • Blocks Idea #16516: Run Keepstore on local compute nodes added
Actions #4

Updated by Peter Amstutz over 2 years ago

  • Related to Idea #18342: Keep performance optimization added
Actions #5

Updated by Peter Amstutz over 2 years ago

  • Blocks deleted (Idea #16516: Run Keepstore on local compute nodes)
Actions #6

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #7

Updated by Peter Amstutz 7 months ago

  • Release deleted (60)
  • Target version set to Development 2023-10-25 sprint
Actions #8

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-10-25 sprint to Development 2023-11-08 sprint
Actions #9

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Actions #10

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-11-29 sprint to Development 2023-11-08 sprint
Actions #11

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Actions #12

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2023-11-29 sprint to Development 2023-11-08 sprint
Actions #13

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Actions #14

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2023-11-29 sprint to Development 2024-01-03 sprint
Actions #15

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Actions #16

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Actions #17

Updated by Peter Amstutz 4 months ago

  • Subject changed from Keep can stream GET and PUT requests to Keepstore can stream GET and PUT requests using keep-gateway API
Actions #18

Updated by Peter Amstutz 4 months ago

  • Blocked by Feature #21197: refactor storage drivers to support keep-gateway API added
Actions #19

Updated by Peter Amstutz 3 months ago

  • Assigned To set to Tom Clegg
Actions #20

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Actions #21

Updated by Tom Clegg 2 months ago

  • Status changed from New to In Progress
Actions #22

Updated by Tom Clegg 2 months ago

2960-keepstore-streaming @ 39f6e9f70f683237d9488faac1c549ca19ac9dae -- developer-run-tests: #4037

This is a substantial refactor of keepstore.
  • Backend drivers are mostly unchanged, except to update the BlockWrite interface (formerly Put) and add a streaming mechanism to Azure/S3 BlockRead (formerly Get) -- see below.
  • The http server is rewritten as two parts: (1) a router that deals with http Requests, ResponseWriters, Headers, etc., and (2) a keepstore that implements the operations BlockRead, BlockWrite, BlockTouch, etc. with typed parameters.
  • Volumes now have access to the buffer pool when reading, so they can use it when needed, but a buffer is not allocated automatically by the higher layers. Practically, this means the unix/filesystem backend, with Serialize: false, will support more clients with a given buffer limit because it doesn't need read buffers.
  • Writing is still store-and-forward for now. Streaming writes while maintaining the md5 checks is possible, but a bit more complicated, and our clients write asynchronously anyway, so I think it makes sense to set this aside for now.
  • status.json and debug.json are removed.
  • Several cleanup/simplification things like
    • Rename S3AWSVolume to S3Volume etc. now that we don't have two separate drivers called "S3" and "S3AWS".
    • Un-export various identifiers that aren't meant to be part of a public interface.
    • Replace the "work queue" abstraction layer with a sync.Cond that notifies workers when the todo list becomes non-empty. Admittedly the pull/trash queues have some nearly identical code now -- but they already had nearly identical code even when they were purportedly sharing the work queue abstraction layer, so I think simpler is better.
    • Tidy up volume interface method names and arguments, e.g., the block read/write funcs take hashes, not locators.
    • Remove "Compare" method from volume interface -- now that volumes do streaming reads, the keepstore layer can call volume.BlockRead and do the compare itself. (but see below)
    • Remove ReadOnly / AllowTrashWhenReadOnly checks from the volume's responsibilities -- now volumes just assume the keepstore layer is doing those checks.
compatibility:
  • Reading now requires hash+size; reading a bare hash doesn't work. AFAICT the only case where this comes up is a pull list sent by a previous version of keep-balance, which (for no good reason) stripped the +size part. Pull lists from old versions of keep-balance are now rejected, which will be logged by keep-balance (rather than N "can't pull block" logs from keepstore).
  • HEAD requests for a bare hash still works. keep-block-check tests rely on this, and it seems worth preserving if only so an admin can get a list of hashes from a backend volume and feed them to keep-block-check without having to attach the file sizes.
  • Some HTTP status codes changed, e.g., 400 instead of 404 when trying to read a locator with an invalid/missing signature. This meant updating some Python test cases. I'm inclined to think nobody IRL is relying on that to return 404.
TODO:
  • The current "compare" implementation has a problem that it uses two buffers for a Write: one for the incoming data and one for Compare. I think the solution is to change the volume's BlockRead interface to accept an io.WriterAt instead of an io.Writer, and put the "reassemble/buffer stream" stuff in the keepstore layer instead of the backend driver layer. Compare should be easy enough to implement as an io.WriterAt.
  • Add back the comments on the volume interface describing requirements for backend implementation
  • Large file download tests on dev cluster
Actions #23

Updated by Peter Amstutz 2 months ago

  • Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Actions #24

Updated by Tom Clegg 2 months ago

2960-keepstore-streaming @ 62168c2db5c36de2362cd1d5785b598b187bbef3 -- developer-run-tests: #4046

(sdk/R install is failing on http://cran.wustl.edu/src/contrib/stringi_1.8.3.tar.gz as usual)

  • Fixes "compare" implementation and moves the "reassemble/buffer stream" code out of the driver layer, as described above.
  • Adds back the comments on the volume interface describing requirements for backend implementation
Large file download test results:
  • Setup: tordo (AWS), single keepstore server -> keep-web -> curl (on same machine as keep-web); repeatedly downloading the same 1.8 GB file. Keep-web's filesystem is small enough that the disk cache always misses in this setup.
  • MB/s @ 2.8.0~dev20240215211742: 75, 90, 96, 97, 98, 91, 97, 91 (mean = 92)
  • MB/s @ 62168c: 130, 130, 121, 125, 135, 128, 139, 132, 121 (mean = 129)
  • From a cursory look at keepstore logs, typical timeToStatus reduces from about 0.25 s to 0.03 s.
  • My guess is that the results would be very different on a higher-powered node, but a 40% increase in throughput is somewhat encouraging.
Actions #25

Updated by Peter Amstutz 2 months ago

The new streaming architecture makes sense. The layering makes sense and from what I remember, it is actually significantly simpler than the old code.

It seems that the keepstore struct doesn't implement the KeepGateway interface (there's no ReadAt or LocalLocator). We were talking about standalone keepstore, crunch-run local keepstore, keepproxy, etc just be different stacks of KeepGateway objects (as well as cutting out the middleman by having keep-web read directly from the underlying S3 bucket). Is that still the plan?

Like the proverbial bikeshed, I mostly minor thoughts that came up looking through keepstore.go:

  • Would it make sense to have parseLocator and locatorInfo as part of the SDK? This can't be the only place we split and assemble locators
  • blockReadRemote seems to be doing some redundant work that could be done by parseLocator, I suppose it is not that urgent but maybe a chance to DRY
Actions #26

Updated by Tom Clegg 2 months ago

Peter Amstutz wrote in #note-25:

The new streaming architecture makes sense. The layering makes sense and from what I remember, it is actually significantly simpler than the old code.

Phew, yes. I'm glad it's not just me.

It seems that the keepstore struct doesn't implement the KeepGateway interface (there's no ReadAt or LocalLocator). We were talking about standalone keepstore, crunch-run local keepstore, keepproxy, etc just be different stacks of KeepGateway objects (as well as cutting out the middleman by having keep-web read directly from the underlying S3 bucket). Is that still the plan?

Yes, I'm still hoping for more progress in that direction. For example, keep-web can't yet import keepstore and start using it instead of a keepclient, but from here we can get there without any major refactoring.

  • Would it make sense to have parseLocator and locatorInfo as part of the SDK? This can't be the only place we split and assemble locators

With parseLocator/locatorInfo I was aiming for an efficient way to get the few bits of information needed for keepstore logic.

I see your point though. We do have several variations on this floating around.

Unifying these would be nice but not exactly trivial. I'm not sure whether we should get into it now. I suppose the saving grace is that the format is simple enough for all of those implementations to correctly find what they're looking for, even if they don't all detect the same set of invalid locators.

Meanwhile, now that you made me look, I added a few more test cases and rewrote parseLocator to avoid making a slice of substrings, and renamed it to getLocatorInfo to make the connection to locatorInfo more obvious.

  • blockReadRemote seems to be doing some redundant work that could be done by parseLocator, I suppose it is not that urgent but maybe a chance to DRY

Unlike parseLocator/getLocatorInfo it needs to construct a new locator with all but the +R hint, but I've at least dropped the strconv.Atoi call in favor of calling getLocatorInfo. Explicitly calling getLocatorInfo to establish that the locator is valid certainly seems cleaner.

2960-keepstore-streaming @ dc6b5b04d0b0ea40e9e9caa499f070f95b99fa07 -- developer-run-tests: #4050

(all tests passed except install sdk/R)

Actions #27

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Actions #28

Updated by Peter Amstutz about 2 months ago

This LGTM, but we should continue to test it on the dev cluster -- I know the read streaming has been tested a lot already, but we should try to confirm that writes, pull lists, trash lists and indexes are working when deployed in a more realistic environment than the unit tests.

Actions #29

Updated by Tom Clegg about 1 month ago

  • Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Actions #30

Updated by Tom Clegg about 1 month ago

Read/write/index/trash/emptytrash have evidently been working on tordo:

[
  {
    "ClusterID": "tordo",
    "PID": 211083,
    "Volume": "s3:///tordo-nyw5e-000000000000000-volume",
    "level": "info",
    "msg": "EmptyTrash: stats for s3:///tordo-nyw5e-000000000000000-volume: Deleted 0 bytes in 0 blocks. Remaining in trash: 782676187225 bytes in 12484 blocks.",
    "time": "2024-03-12T20:45:10.641423112Z" 
  },
  {
    "ClusterID": "tordo",
    "PID": 211083,
    "Volume": "s3:///tordo-nyw5e-000000000000000-volume",
    "level": "info",
    "msg": "EmptyTrash: stats for s3:///tordo-nyw5e-000000000000000-volume: Deleted 782559703416 bytes in 12482 blocks. Remaining in trash: 3662365447 bytes in 62 blocks.",
    "time": "2024-03-13T20:48:35.614803661Z" 
  },
  {
    "ClusterID": "tordo",
    "PID": 225857,
    "Volume": "s3:///tordo-nyw5e-000000000000000-volume",
    "level": "info",
    "msg": "EmptyTrash: stats for s3:///tordo-nyw5e-000000000000000-volume: Deleted 0 bytes in 0 blocks. Remaining in trash: 4694240624 bytes in 78 blocks.",
    "time": "2024-03-15T17:03:58.901629882Z" 
  }
]

Reading data from a jutro collection via tordo:

{
  "ClusterID": "tordo",
  "PID": 264869,
  "RequestID": "req-1ay8ck3u6oacli0de09b",
  "level": "info",
  "msg": "request",
  "remoteAddr": "10.253.0.41:49522",
  "reqBytes": 0,
  "reqForwardedFor": "",
  "reqHost": "keep0.tordo.arvadosapi.com:25107",
  "reqMethod": "HEAD",
  "reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+Rjutro-96565456488863813f344bb0069834ea7dfd5b03@660f0fb7",
  "reqQuery": "",
  "time": "2024-03-21T20:38:15.378605744Z" 
}
{
  "ClusterID": "tordo",
  "PID": 264869,
  "level": "info",
  "msg": "blockReadRemote(888248dacf4cf38691a7140ba9cd4740+38526214+Rjutro-96565456488863813f344bb0069834ea7dfd5b03@660f0fb7): remote read(888248dacf4cf38691a7140ba9cd4740+38526214+A96565456488863813f344bb0069834ea7dfd5b03@660f0fb7)",
  "time": "2024-03-21T20:38:15.466313443Z" 
}
{
  "ClusterID": "tordo",
  "PID": 264869,
  "RequestID": "req-1ay8ck3u6oacli0de09b",
  "level": "info",
  "msg": "response",
  "priority": 0,
  "queue": "api",
  "remoteAddr": "10.253.0.41:49522",
  "reqBytes": 0,
  "reqForwardedFor": "",
  "reqHost": "keep0.tordo.arvadosapi.com:25107",
  "reqMethod": "HEAD",
  "reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+Rjutro-96565456488863813f344bb0069834ea7dfd5b03@660f0fb7",
  "reqQuery": "",
  "respBytes": 0,
  "respStatus": "OK",
  "respStatusCode": 200,
  "time": "2024-03-21T20:38:17.364671315Z",
  "timeToStatus": 1.986062,
  "timeTotal": 1.986062,
  "timeWriteBody": 0
}
{
  "ClusterID": "tordo",
  "PID": 264869,
  "RequestID": "req-1dpm85tltvyzz19xf67y",
  "level": "info",
  "msg": "request",
  "remoteAddr": "10.253.0.41:49522",
  "reqBytes": 0,
  "reqForwardedFor": "",
  "reqHost": "keep0.tordo.arvadosapi.com:25107",
  "reqMethod": "GET",
  "reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+A796420ec1bee1c1a16ba146f67d7cf2e7af99f56@660f0fb9",
  "reqQuery": "",
  "time": "2024-03-21T20:38:32.005439880Z" 
}
{
  "ClusterID": "tordo",
  "PID": 264869,
  "RequestID": "req-1dpm85tltvyzz19xf67y",
  "level": "info",
  "msg": "response",
  "priority": 0,
  "queue": "api",
  "remoteAddr": "10.253.0.41:49522",
  "reqBytes": 0,
  "reqForwardedFor": "",
  "reqHost": "keep0.tordo.arvadosapi.com:25107",
  "reqMethod": "GET",
  "reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+A796420ec1bee1c1a16ba146f67d7cf2e7af99f56@660f0fb9",
  "reqQuery": "",
  "respBytes": 38526214,
  "respStatus": "OK",
  "respStatusCode": 200,
  "time": "2024-03-21T20:38:32.335425196Z",
  "timeToStatus": 0.066235,
  "timeTotal": 0.329962,
  "timeWriteBody": 0.263727
}

Can't test "pull" with this single-volume config.

Actions #31

Updated by Tom Clegg about 1 month ago

Discovered bug #21617 while testing the "fetch block from remote cluster" function.

Actions #32

Updated by Peter Amstutz 30 days ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Actions #33

Updated by Peter Amstutz 16 days ago

Tom to set up a test environment with two keepstores and with some misplaced blocks and show that keep-balance + keepstore causes the blocks to moved to the right place.

Actions #34

Updated by Peter Amstutz 16 days ago

  • Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Actions #35

Updated by Peter Amstutz 1 day ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions

Also available in: Atom PDF