Feature #2960
closedKeepstore can stream GET and PUT requests using keep-gateway API
Files
Related issues
Updated by Tim Pierce over 10 years ago
- Tracker changed from Bug to Feature
- Subject changed from Keep streams GET and PUT requests to Keep can stream GET and PUT requests
Current implementation of Keep @ a4378cd handles both GET and PUT requests by reading an entire blob (up to 64MB) into memory, performing checksums and any other policy checks, and then delivering the block appropriately. With many concurrent clients, this can bring Keep memory usage up to 500MB and more.
A more memory-efficient solution, and lower latency from the client's point of view, would be to stream data between the client and disk as much as possible with smaller buffers. The keepclient
package has some useful tools for this.
- When
--serialize-io
is enabled, Keep must continue to read an entire block and buffer it, or else a single slow client may block other clients while they wait for disk access. - GET cannot issue a 402 Corruption error without hashing the block first. Possible solutions include:
- Read the block twice through a small buffer: once to calculate the checksum, and again to stream it to the client.
- Deliver any corruption status in the second part of a multipart/mixed message.
- Abandon any attempt to report corruption to the client and simply log it to the data manager, on the grounds that the client is already responsible for checking for corrupt blocks anyway.
- PUT checking for collisions, which currently relies on
GetBlock
, will need to be rethought to handle streaming data.
Updated by Ward Vandewege over 4 years ago
- Blocks Idea #16516: Run Keepstore on local compute nodes added
Updated by Peter Amstutz almost 3 years ago
- Related to Idea #18342: Keep performance optimization added
Updated by Peter Amstutz almost 3 years ago
- Blocks deleted (Idea #16516: Run Keepstore on local compute nodes)
Updated by Peter Amstutz about 1 year ago
- Release deleted (
60) - Target version set to Development 2023-10-25 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-10-25 sprint to Development 2023-11-08 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-29 sprint to Development 2023-11-08 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-29 sprint to Development 2023-11-08 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2023-11-29 sprint to Development 2024-01-03 sprint
Updated by Peter Amstutz about 1 year ago
- Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Updated by Peter Amstutz 11 months ago
- Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Updated by Peter Amstutz 11 months ago
- Subject changed from Keep can stream GET and PUT requests to Keepstore can stream GET and PUT requests using keep-gateway API
Updated by Peter Amstutz 11 months ago
- Blocked by Feature #21197: refactor storage drivers to support keep-gateway API added
Updated by Peter Amstutz 10 months ago
- Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Updated by Tom Clegg 9 months ago
2960-keepstore-streaming @ 39f6e9f70f683237d9488faac1c549ca19ac9dae -- developer-run-tests: #4037
This is a substantial refactor of keepstore.- Backend drivers are mostly unchanged, except to update the BlockWrite interface (formerly Put) and add a streaming mechanism to Azure/S3 BlockRead (formerly Get) -- see below.
- The http server is rewritten as two parts: (1) a router that deals with http Requests, ResponseWriters, Headers, etc., and (2) a keepstore that implements the operations BlockRead, BlockWrite, BlockTouch, etc. with typed parameters.
- Volumes now have access to the buffer pool when reading, so they can use it when needed, but a buffer is not allocated automatically by the higher layers. Practically, this means the unix/filesystem backend, with
Serialize: false
, will support more clients with a given buffer limit because it doesn't need read buffers. - Writing is still store-and-forward for now. Streaming writes while maintaining the md5 checks is possible, but a bit more complicated, and our clients write asynchronously anyway, so I think it makes sense to set this aside for now.
- status.json and debug.json are removed.
- Several cleanup/simplification things like
- Rename S3AWSVolume to S3Volume etc. now that we don't have two separate drivers called "S3" and "S3AWS".
- Un-export various identifiers that aren't meant to be part of a public interface.
- Replace the "work queue" abstraction layer with a sync.Cond that notifies workers when the todo list becomes non-empty. Admittedly the pull/trash queues have some nearly identical code now -- but they already had nearly identical code even when they were purportedly sharing the work queue abstraction layer, so I think simpler is better.
- Tidy up volume interface method names and arguments, e.g., the block read/write funcs take hashes, not locators.
- Remove "Compare" method from volume interface -- now that volumes do streaming reads, the keepstore layer can call volume.BlockRead and do the compare itself. (but see below)
- Remove ReadOnly / AllowTrashWhenReadOnly checks from the volume's responsibilities -- now volumes just assume the keepstore layer is doing those checks.
- Reading now requires
hash+size
; reading a bare hash doesn't work. AFAICT the only case where this comes up is a pull list sent by a previous version of keep-balance, which (for no good reason) stripped the +size part. Pull lists from old versions of keep-balance are now rejected, which will be logged by keep-balance (rather than N "can't pull block" logs from keepstore). - HEAD requests for a bare hash still works. keep-block-check tests rely on this, and it seems worth preserving if only so an admin can get a list of hashes from a backend volume and feed them to keep-block-check without having to attach the file sizes.
- Some HTTP status codes changed, e.g., 400 instead of 404 when trying to read a locator with an invalid/missing signature. This meant updating some Python test cases. I'm inclined to think nobody IRL is relying on that to return 404.
- The current "compare" implementation has a problem that it uses two buffers for a Write: one for the incoming data and one for Compare. I think the solution is to change the volume's BlockRead interface to accept an io.WriterAt instead of an io.Writer, and put the "reassemble/buffer stream" stuff in the keepstore layer instead of the backend driver layer. Compare should be easy enough to implement as an io.WriterAt.
- Add back the comments on the volume interface describing requirements for backend implementation
- Large file download tests on dev cluster
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Updated by Tom Clegg 9 months ago
2960-keepstore-streaming @ 62168c2db5c36de2362cd1d5785b598b187bbef3 -- developer-run-tests: #4046
(sdk/R install is failing on http://cran.wustl.edu/src/contrib/stringi_1.8.3.tar.gz
as usual)
- Fixes "compare" implementation and moves the "reassemble/buffer stream" code out of the driver layer, as described above.
- Adds back the comments on the volume interface describing requirements for backend implementation
- Setup: tordo (AWS), single keepstore server -> keep-web -> curl (on same machine as keep-web); repeatedly downloading the same 1.8 GB file. Keep-web's filesystem is small enough that the disk cache always misses in this setup.
- MB/s @ 2.8.0~dev20240215211742: 75, 90, 96, 97, 98, 91, 97, 91 (mean = 92)
- MB/s @ 62168c: 130, 130, 121, 125, 135, 128, 139, 132, 121 (mean = 129)
- From a cursory look at keepstore logs, typical timeToStatus reduces from about 0.25 s to 0.03 s.
- My guess is that the results would be very different on a higher-powered node, but a 40% increase in throughput is somewhat encouraging.
Updated by Peter Amstutz 9 months ago
The new streaming architecture makes sense. The layering makes sense and from what I remember, it is actually significantly simpler than the old code.
It seems that the keepstore
struct doesn't implement the KeepGateway
interface (there's no ReadAt
or LocalLocator
). We were talking about standalone keepstore, crunch-run local keepstore, keepproxy, etc just be different stacks of KeepGateway objects (as well as cutting out the middleman by having keep-web read directly from the underlying S3 bucket). Is that still the plan?
Like the proverbial bikeshed, I mostly minor thoughts that came up looking through keepstore.go
:
- Would it make sense to have
parseLocator
andlocatorInfo
as part of the SDK? This can't be the only place we split and assemble locators blockReadRemote
seems to be doing some redundant work that could be done byparseLocator
, I suppose it is not that urgent but maybe a chance to DRY
Updated by Tom Clegg 9 months ago
Peter Amstutz wrote in #note-25:
The new streaming architecture makes sense. The layering makes sense and from what I remember, it is actually significantly simpler than the old code.
Phew, yes. I'm glad it's not just me.
It seems that the
keepstore
struct doesn't implement theKeepGateway
interface (there's noReadAt
orLocalLocator
). We were talking about standalone keepstore, crunch-run local keepstore, keepproxy, etc just be different stacks of KeepGateway objects (as well as cutting out the middleman by having keep-web read directly from the underlying S3 bucket). Is that still the plan?
Yes, I'm still hoping for more progress in that direction. For example, keep-web can't yet import keepstore and start using it instead of a keepclient, but from here we can get there without any major refactoring.
- Would it make sense to have
parseLocator
andlocatorInfo
as part of the SDK? This can't be the only place we split and assemble locators
With parseLocator/locatorInfo I was aiming for an efficient way to get the few bits of information needed for keepstore logic.
I see your point though. We do have several variations on this floating around.type SizedDigest string
in source:sdk/go/arvados/keep_block.gofunc MakeLocator(path string)
in source:sdk/go/keepclient/keepclient.goSignedLocatorRe
regexp in source:sdk/go/arvados/blob_signature.go
Unifying these would be nice but not exactly trivial. I'm not sure whether we should get into it now. I suppose the saving grace is that the format is simple enough for all of those implementations to correctly find what they're looking for, even if they don't all detect the same set of invalid locators.
Meanwhile, now that you made me look, I added a few more test cases and rewrote parseLocator to avoid making a slice of substrings, and renamed it to getLocatorInfo to make the connection to locatorInfo more obvious.
blockReadRemote
seems to be doing some redundant work that could be done byparseLocator
, I suppose it is not that urgent but maybe a chance to DRY
Unlike parseLocator/getLocatorInfo it needs to construct a new locator with all but the +R hint, but I've at least dropped the strconv.Atoi call in favor of calling getLocatorInfo. Explicitly calling getLocatorInfo to establish that the locator is valid certainly seems cleaner.
2960-keepstore-streaming @ dc6b5b04d0b0ea40e9e9caa499f070f95b99fa07 -- developer-run-tests: #4050
(all tests passed except install sdk/R
)
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Updated by Peter Amstutz 9 months ago
This LGTM, but we should continue to test it on the dev cluster -- I know the read streaming has been tested a lot already, but we should try to confirm that writes, pull lists, trash lists and indexes are working when deployed in a more realistic environment than the unit tests.
Updated by Tom Clegg 8 months ago
Read/write/index/trash/emptytrash have evidently been working on tordo:
[
{
"ClusterID": "tordo",
"PID": 211083,
"Volume": "s3:///tordo-nyw5e-000000000000000-volume",
"level": "info",
"msg": "EmptyTrash: stats for s3:///tordo-nyw5e-000000000000000-volume: Deleted 0 bytes in 0 blocks. Remaining in trash: 782676187225 bytes in 12484 blocks.",
"time": "2024-03-12T20:45:10.641423112Z"
},
{
"ClusterID": "tordo",
"PID": 211083,
"Volume": "s3:///tordo-nyw5e-000000000000000-volume",
"level": "info",
"msg": "EmptyTrash: stats for s3:///tordo-nyw5e-000000000000000-volume: Deleted 782559703416 bytes in 12482 blocks. Remaining in trash: 3662365447 bytes in 62 blocks.",
"time": "2024-03-13T20:48:35.614803661Z"
},
{
"ClusterID": "tordo",
"PID": 225857,
"Volume": "s3:///tordo-nyw5e-000000000000000-volume",
"level": "info",
"msg": "EmptyTrash: stats for s3:///tordo-nyw5e-000000000000000-volume: Deleted 0 bytes in 0 blocks. Remaining in trash: 4694240624 bytes in 78 blocks.",
"time": "2024-03-15T17:03:58.901629882Z"
}
]
Reading data from a jutro collection via tordo:
{
"ClusterID": "tordo",
"PID": 264869,
"RequestID": "req-1ay8ck3u6oacli0de09b",
"level": "info",
"msg": "request",
"remoteAddr": "10.253.0.41:49522",
"reqBytes": 0,
"reqForwardedFor": "",
"reqHost": "keep0.tordo.arvadosapi.com:25107",
"reqMethod": "HEAD",
"reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+Rjutro-96565456488863813f344bb0069834ea7dfd5b03@660f0fb7",
"reqQuery": "",
"time": "2024-03-21T20:38:15.378605744Z"
}
{
"ClusterID": "tordo",
"PID": 264869,
"level": "info",
"msg": "blockReadRemote(888248dacf4cf38691a7140ba9cd4740+38526214+Rjutro-96565456488863813f344bb0069834ea7dfd5b03@660f0fb7): remote read(888248dacf4cf38691a7140ba9cd4740+38526214+A96565456488863813f344bb0069834ea7dfd5b03@660f0fb7)",
"time": "2024-03-21T20:38:15.466313443Z"
}
{
"ClusterID": "tordo",
"PID": 264869,
"RequestID": "req-1ay8ck3u6oacli0de09b",
"level": "info",
"msg": "response",
"priority": 0,
"queue": "api",
"remoteAddr": "10.253.0.41:49522",
"reqBytes": 0,
"reqForwardedFor": "",
"reqHost": "keep0.tordo.arvadosapi.com:25107",
"reqMethod": "HEAD",
"reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+Rjutro-96565456488863813f344bb0069834ea7dfd5b03@660f0fb7",
"reqQuery": "",
"respBytes": 0,
"respStatus": "OK",
"respStatusCode": 200,
"time": "2024-03-21T20:38:17.364671315Z",
"timeToStatus": 1.986062,
"timeTotal": 1.986062,
"timeWriteBody": 0
}
{
"ClusterID": "tordo",
"PID": 264869,
"RequestID": "req-1dpm85tltvyzz19xf67y",
"level": "info",
"msg": "request",
"remoteAddr": "10.253.0.41:49522",
"reqBytes": 0,
"reqForwardedFor": "",
"reqHost": "keep0.tordo.arvadosapi.com:25107",
"reqMethod": "GET",
"reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+A796420ec1bee1c1a16ba146f67d7cf2e7af99f56@660f0fb9",
"reqQuery": "",
"time": "2024-03-21T20:38:32.005439880Z"
}
{
"ClusterID": "tordo",
"PID": 264869,
"RequestID": "req-1dpm85tltvyzz19xf67y",
"level": "info",
"msg": "response",
"priority": 0,
"queue": "api",
"remoteAddr": "10.253.0.41:49522",
"reqBytes": 0,
"reqForwardedFor": "",
"reqHost": "keep0.tordo.arvadosapi.com:25107",
"reqMethod": "GET",
"reqPath": "888248dacf4cf38691a7140ba9cd4740+38526214+A796420ec1bee1c1a16ba146f67d7cf2e7af99f56@660f0fb9",
"reqQuery": "",
"respBytes": 38526214,
"respStatus": "OK",
"respStatusCode": 200,
"time": "2024-03-21T20:38:32.335425196Z",
"timeToStatus": 0.066235,
"timeTotal": 0.329962,
"timeWriteBody": 0.263727
}
Can't test "pull" with this single-volume config.
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Updated by Peter Amstutz 7 months ago
Tom to set up a test environment with two keepstores and with some misplaced blocks and show that keep-balance + keepstore causes the blocks to moved to the right place.
Updated by Peter Amstutz 7 months ago
- Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Updated by Peter Amstutz 7 months ago
- Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Updated by Peter Amstutz 7 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Updated by Tom Clegg 6 months ago
- File integrationSuite.TestBalanceAPIFixtures.log.txt integrationSuite.TestBalanceAPIFixtures.log.txt added
We do have a keep-balance integration test that verifies keepstore's pull feature is working:
What next? test services/keep-balance -check.f=BalanceAPIFixtures -check.vv ... time="2024-05-15T09:39:42-04:00" level=info msg="1 replicas (1 blocks, 3 bytes) underreplicated (0<have<want)" ... time="2024-05-15T09:39:42-04:00" level=info msg="zzzzz-bi6l4-keepdisk0000000 (localhost:34149, disk): ChangeSet{Pulls:1, Trashes:0} Deferred{Pulls:0 Trashes:0}\n" ... time="2024-05-15T09:39:43-04:00" level=info msg="0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want)" ... time="2024-05-15T09:39:43-04:00" level=info msg="zzzzz-bi6l4-keepdisk0000000 (localhost:34149, disk): ChangeSet{Pulls:0, Trashes:0} Deferred{Pulls:0 Trashes:0}\n" ...