Feature #22321
closedUpdate keepweb.handler to repack after writing
Description
- when handling a PUT request, first write the file (using replace_files); then call Repack (with CachedOnly: true) on the updated collection; then call Sync if anything was repacked
- this ensures the upload is preserved even if Repack/Sync goes badly, e.g., in a race with another update
- if another request is already running Sync on the same collection UUID, just skip it this time
Updated by Tom Clegg 5 months ago
- Related to Idea #20996: Efficient packing of small files into blocks in keep-web added
Updated by Peter Amstutz 3 months ago
- Target version changed from Development 2025-01-08 to Development 2025-01-29
Updated by Peter Amstutz 2 months ago
- Target version changed from Development 2025-01-29 to Development 2025-02-12
Updated by Peter Amstutz 2 months ago
- Target version changed from Development 2025-02-12 to Development 2025-02-26
Updated by Peter Amstutz about 1 month ago
- Target version changed from Development 2025-02-26 to Development 2025-03-19
Updated by Peter Amstutz 17 days ago
- Target version changed from Development 2025-03-19 to Development 2025-04-02
Updated by Tom Clegg 10 days ago
- Status changed from New to In Progress
22321-webdav-write-repack @ 56159300252047c1e281bd6464ea3a616bf2d9ce -- developer-run-tests: #4713
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- ✅ After a successful file upload, call Repack with
CachedOnly: true
. - ✅ ...but skip if another goroutine is already doing Repack/Sync on the same collection.
- ✅ Error in Repack/Sync does not cause the upload to fail.
- ✅ After a successful file upload, call Repack with
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- n/a
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- New test checks does some concurrent uploads followed by one single upload to the same collection after it's quiesced, and checks that the resulting collection has been repacked.
- New test also logs the number of distinct blocks after each file, and the logs show that some repacking does tend to happen during the concurrent phase (as opposed to only happening at the end after the last file) -- but this is inherently unpredictable so the test case doesn't have a failure threshold for this part.
- New or changed UX/UX and has gotten feedback from stakeholders.
- n/a
- Documentation has been updated.
- n/a
- Behaves appropriately at the intended scale (describe intended scale).
- The existing ConcurrentWrites test takes about 15% longer now (7 → 8 seconds).
- We could reduce the overhead by doing something like "skip if we've already repacked this collection <30 seconds ago, and fewer than 8 files have been uploaded since then".
- Based on the 15% figure and the charts on #22320#note-14, I'm inclined to keep it simple.
- Considered backwards and forwards compatibility issues between client and server.
- Due to a json-unmarshal bug fixed here, when you combine a new keep-web with an old controller, repacking will always fail. I think this is acceptable because
- the upload still succeeds even if repacking fails
- typically keep-web and controller get upgraded together
- Due to a json-unmarshal bug fixed here, when you combine a new keep-web with an old controller, repacking will always fail. I think this is acceptable because
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Tom Clegg 9 days ago
- Related to Idea #18342: Keep performance optimization added
Updated by Brett Smith 9 days ago
Tom Clegg wrote in #note-7:
22321-webdav-write-repack @ 56159300252047c1e281bd6464ea3a616bf2d9ce -- developer-run-tests: #4713
I think this is good to merge as-is. Here are things I think could make it even better, but they're basically all wishlist items.
- The 15% performance hit observed in the test seems at least noteworthy. I don't think it should block a merge, the test seems like an obviously worst-case scenario, and we're trading one kind of scale (net upload time) for another (manifest and later download performance). But I wonder if it would be wise to make a follow-up story to test tordo with real clients after this merges and deploys? To try to suss out if we see any more serious performance degredation we might want to get ahead of before we roll it out to our customers.
- In
block_segment_test.go
it would be nice to see a test for Marhsal and a test for round-tripping in addition to the test for Unmarshal. It seems to me like roundtripping is the main behavior we care about, and then if it fails hopefully the individual tests for each half can give you more information about which side to look at. - It would be nice if the new integration test was more DRY. For test readibility reasons: the fact that there's soooo much boilerplate in the build-and-serve-a-request stanza that appears five times makes it harder to suss out what's actually meaningfully different between those requests. When that's easier to see, it's easier to diagnose a test failure.
I know test utility functions easily fall into a trap of over-specializing and I wouldn't want to repeat that mistake either. As one idea, what about a function with the signature:
This could build the URL and then the Request from it. I think that would cut the boilerplate of each stanza in half, without interfering with the fact that each request gets served a little differently.func (s *IntegrationSuite) mustRequest(c *check.C, verb, uuid, path, token string, body *io.Reader) *http.Request
I'm not wedded to this specific idea, just trying to brainstorm ways the test could be more DRY without becoming brittle.
Thanks.
Updated by Tom Clegg 8 days ago
Brett Smith wrote in #note-9:
test tordo with real clients after this merges and deploys
Agreed, added #22707
- In
block_segment_test.go
it would be nice to see a test for Marhsal and a test for round-tripping in addition to the test for Unmarshal. It seems to me like roundtripping is the main behavior we care about, and then if it fails hopefully the individual tests for each half can give you more information about which side to look at.
Good call, added.
- It would be nice if the new integration test was more DRY.
Agreed, and we happen to already have a do() method that does nearly this entire thing. Updated that to accept a body argument, and deleted a lot of the repetitive stuff in ConcurrentWrites and Repack tests.
22321-webdav-write-repack @ 9ecb92a3c4170a6f074953b6dace913fc59c1e5a -- developer-run-tests: #4717
Updated by Brett Smith 8 days ago
Tom Clegg wrote in #note-10:
22321-webdav-write-repack @ 9ecb92a3c4170a6f074953b6dace913fc59c1e5a -- developer-run-tests: #4717
LGTM, thanks.