Project

General

Profile

Actions

Feature #21702

closed

keep-web uses replace_files API to add/replace/rename/remove files atomically

Added by Tom Clegg 9 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Story points:
-
Release:
Release relationship:
Auto

Description

See Concurrent writes to a single collection

Instead of writing to the target path in the CustomFileSystem, keep-web will write the uploaded data to a new collection, then supply the resulting manifest to the replace_files API as source to atomically copy the new file into the target collection.


Subtasks 1 (0 open1 closed)

Task #22011: Review 21702-keep-web-replace_filesResolvedTom Clegg09/25/2024Actions

Related issues 3 (1 open2 closed)

Related to Arvados Epics - Idea #18342: Keep performance optimizationNew08/01/202312/31/2024Actions
Related to Arvados - Bug #22184: Consistent WebDAV behavior in load-balanced installationResolvedTom CleggActions
Has duplicate Arvados - Feature #18871: WebDAV uses replace_files APIDuplicateActions
Actions #1

Updated by Tom Clegg 9 months ago

  • Related to Idea #18342: Keep performance optimization added
Actions #3

Updated by Peter Amstutz 8 months ago

  • Target version set to Development 2024-06-05 sprint
Actions #4

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-06-05 sprint to 439
Actions #5

Updated by Peter Amstutz 8 months ago

  • Target version changed from 439 to Development 2024-06-05 sprint
Actions #6

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-06-05 sprint to 439
Actions #7

Updated by Peter Amstutz 8 months ago

  • Target version changed from 439 to Development 2024-07-03 sprint
Actions #8

Updated by Peter Amstutz 7 months ago

  • Release set to 70
Actions #9

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-07-03 sprint to Development 2024-08-07 sprint
Actions #10

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-08-07 sprint to Development 2024-07-24 sprint
Actions #11

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-07-24 sprint to Development 2024-08-07 sprint
Actions #12

Updated by Peter Amstutz 6 months ago

  • Assigned To set to Tom Clegg
Actions #13

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-08-07 sprint to Development 2024-08-28 sprint
Actions #14

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-08-28 sprint to Development 2024-09-11 sprint
Actions #15

Updated by Tom Clegg 4 months ago

  • Status changed from New to In Progress
Actions #16

Updated by Tom Clegg 4 months ago

21702-keep-web-replace_files @ 6b978471b3be79960b230b0278fcb778cbba8101 -- developer-run-tests: #4439

  • All agreed upon points are implemented / addressed.
    • ✅ keep-web uses the replace_files API, and does not serialize concurrent writes to a single collection
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • ✅ upsized IntegrationSuite.TestConcurrentWrites a bit by doing more writes, and writing larger files
  • Documentation has been updated.
    • N/A
  • Behaves appropriately at the intended scale (describe intended scale).
    • ✅ it's now safe to run multiple keep-web servers behind a load balancer
  • Considered backwards and forwards compatibility issues between client and server.
    • no known compatibility issues
  • Follows our coding standards and GUI style guidelines.

There's a bit of code that mimics functionality in the WebDAV library: we need to understand the "destination" header in a move/copy request in order to build an appropriate replace_files request.

I contemplated skipping the webdav handler entirely for the MOVE/COPY/DELETE/MKCOL cases, because we don't even look at the modifications they make to the temporary filesystem. But the webdav handler does implement a number of checks (like refusing to MOVE a collection with Depth other than infinity) and I'd rather accept redundant filesystem operations than duplicate all that validation code.

That would be worth revisiting that if/when someone starts using the WebDAV COPY method on non-trivial-sized files/directories. As it stands the webdav handler actually reads and writes all of the file content, only for us to throw away the result. To be clear, this branch doesn't make the situation worse at all.

In trials on my dev machine, the (new upsized) IntegrationSuite.TestConcurrentWrites test runs in about 25s, compared to 35s with the old "serialize writes to same collection ID" code.

I think the performance improvement will be bigger for some more lifelike scenarios -- in particular, concurrent writes where the file data transfer from client to Keep takes much longer than the collection-update transaction. I'm not sure that kind of performance testing should be a blocker, given the other (more important?) goal of correctness in a load-balancing setup.

Actions #17

Updated by Tom Clegg 4 months ago

TODO:
  • test copy collection with depth 0
  • test copy/move collection with invalid depth
Actions #18

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-09-11 sprint to Development 2024-09-25 sprint
Actions #19

Updated by Tom Clegg 4 months ago

21702-keep-web-replace_files @ 87335196b621d4d3ebaba6267734cf3d888338b7 -- developer-run-tests: #4445
  • fix COPY collection with depth 0
  • test copy/move/delete with various allowed/disallowed depth headers
  • revert double-size TestConcurrentWrites (decided it was using too many gigabytes of RAM for no good reason)
Actions #20

Updated by Brett Smith 4 months ago

The comments are great, the code is straightforward, I would be fine for this to be merged as-is, I do have a couple of suggestions for improvement.

I'm concerned about the way that the loop of TestDepthHeader relies on past trials to have succeeded, but merely checks success with c.Check. This means if one early trial fails, potentially many others will fail after it because their expected preconditions are not met. This will generate a lot of failures and make it more difficult to find the "real" problem.

Minimum nicest change would be to change the Check calls to Assert, since we can't know if it's safe to continue testing after a failure.

Next nice change on top of that would be to test each "set of trials" (starting with the setup triplet) separately. e.g., you could change TestDepthHeader to a function that takes a slice of trials, and then have a separate functions for TestDeleteDepth, TestMoveDepth, and TestDeleteEmptyDirectoryDepth (and maybe this could even be broken down further into different sets).

In the code, I think it would be nice to s/rfc4918/RFC 4918/g to match the IETF's own styling.

Thanks.

Actions #21

Updated by Tom Clegg 4 months ago

21702-keep-web-replace_files @ ac5bfa49f0e08fd635071c03966ac23b30b39073 -- developer-run-tests: #4453
  • check → assert
  • rfcNNNN → RFC NNNN (here and in a few other places)
  • merged main

failed keepstoreSuite.TestBlockWrite_MultipleStorageClasses -- retry developer-run-tests-remainder: #4675

Actions #22

Updated by Tom Clegg 3 months ago

  • Status changed from In Progress to Resolved
Actions #23

Updated by Peter Amstutz 3 months ago

Actions #24

Updated by Tom Clegg 2 months ago

  • Related to Bug #22184: Consistent WebDAV behavior in load-balanced installation added
Actions

Also available in: Atom PDF