Project

General

Profile

Actions

Feature #14538

closed

[keep-web] Do not block writes while flushing blocks to Keep

Added by Tom Clegg about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
-
Release relationship:
Auto

Description

Currently, writing a large file via collectionfs (e.g., webdav PUT) pauses every 64 MiB while a block is written to Keep.

The block flushing should be done in the background and subsequent writes should be allowed to continue.


Subtasks 1 (0 open1 closed)

Task #14548: Review 14538-async-writeResolvedPeter Amstutz11/28/2018Actions
Actions #1

Updated by Tom Clegg about 6 years ago

  • Assigned To set to Tom Clegg
  • Target version changed from To Be Groomed to 2018-12-12 Sprint
Actions #2

Updated by Tom Clegg about 6 years ago

  • Category set to Keep
  • Status changed from New to In Progress

14538-async-write @ 78c18757e42c40178d7a9eaf78f7b6d167bee926

This allows up to 4 concurrent writes per file while writing, and up to 4 concurrent writes per collection when syncing a collection in MarshalManifest.

TODO:
  • make the limit configurable
  • limit concurrency at the filesystem level instead of per-file, to accommodate callers that write multiple large files concurrently

I'm not sure whether this should be held up by one or both of those.

Actions #3

Updated by Tom Clegg about 6 years ago

I'm thinking the default/initial write concurrency limit (flushing finished blocks while writing a file) should be 1 rather than 4. If the Keep side is slower than the incoming data, more concurrency won't necessarily help -- it'll just use more memory.

Actions #4

Updated by Peter Amstutz about 6 years ago

Over all LGTM assuming tests pass (link to jenkins below). Here's some mostly stylistic comments, take what you will.

In dirnode.sync():

  • Stylistically, I don't know if flush() really benefits from being declared inline. It could just as easily be a method of dirnode and take throttle as a parameter. In order to follow execution of the code, I have to scroll down to the bottom of the function, and then scroll back up again.
  • pendingLen is used before it is assigned. I know this means it has its default value (0) but it would be helpful to make that explicit so it doesn't look like a bug.

In pruneMemSegments():

  • This seems redundant: seg.Len() < maxBlockSize || seg.Len() == 0
  • Is there a potential race between pruneMemSegments() and sync() ? sync() does not check "flushing" so it seems like it might try to flush a block which is already being flushed. By my reading of the code, it probably is not a disaster if this happens (it would replace one storedSegment with an equivalent one) but worth considering. Now I see it calls waitPrune() before that, so no problem.

I'm thinking the default/initial write concurrency limit (flushing finished blocks while writing a file) should be 1 rather than 4. If the Keep side is slower than the incoming data, more concurrency won't necessarily help -- it'll just use more memory.

I expect the common case is slow clients / fast backend, in which case more threads won't get used. And as you said, fast client / slow backend would result in a pileup. The Python client uses 2 threads by default, which enables it to continue making progress if one write stalls due to a transient failure.

I'm a little concerned that flushing a collection seems to require iterating over every single segment of every single file, which could be expensive for very large collections (some benchmarking is warranted).

I submitted a test run here:

https://ci.curoverse.com/view/Developer/job/developer-run-tests/986/

Actions #5

Updated by Tom Clegg about 6 years ago

Peter Amstutz wrote:

  • Stylistically, I don't know if flush() really benefits from being declared inline. It could just as easily be a method of dirnode and take throttle as a parameter. In order to follow execution of the code, I have to scroll down to the bottom of the function, and then scroll back up again.

Yes, sync() was getting long. Moved flush to (*dirnode)commitBlock(ctx, throttle, []fnSegmentRef).

  • pendingLen is used before it is assigned. I know this means it has its default value (0) but it would be helpful to make that explicit so it doesn't look like a bug.

Done.

  • This seems redundant: seg.Len() < maxBlockSize || seg.Len() == 0

Indeed. Removed the == 0 part.

I'm a little concerned that flushing a collection seems to require iterating over every single segment of every single file, which could be expensive for very large collections (some benchmarking is warranted).

This seems unavoidable, since flushing a collection involves producing a manifest which references every segment of every file. Now we iterate twice, but (even without the concurrency being added here) I expect that's still much faster than the ensuing network round-trip with the resulting manifest.

We could stash the last known manifest text and use a "dirty" flag to optimize the no-op case, but it seems out of scope here unless I'm missing something.

test run

Fixed flaky test.

14538-async-write @ a88f7ad9728ee6968367928c6d3d7613bbf290ec https://ci.curoverse.com/view/Developer/job/developer-run-tests/988/

Actions #6

Updated by Peter Amstutz about 6 years ago

Tom Clegg wrote:

I'm a little concerned that flushing a collection seems to require iterating over every single segment of every single file, which could be expensive for very large collections (some benchmarking is warranted).

This seems unavoidable, since flushing a collection involves producing a manifest which references every segment of every file. Now we iterate twice, but (even without the concurrency being added here) I expect that's still much faster than the ensuing network round-trip with the resulting manifest.

The Python client is structured a little differently, block operations go through a BlockManager which tracks which blocks are pending, so there's no need to iterate over all the segments when flushing blocks. But this story is about incrementally flushing a PUT of a single large file so it isn't going to be iterating over the manifest except at the end.

We could stash the last known manifest text and use a "dirty" flag to optimize the no-op case, but it seems out of scope here unless I'm missing something.

14538-async-write @ a88f7ad9728ee6968367928c6d3d7613bbf290ec https://ci.curoverse.com/view/Developer/job/developer-run-tests/988/

This LGTM.

Actions #7

Updated by Tom Clegg about 6 years ago

  • Status changed from In Progress to Resolved
Actions #8

Updated by Tom Morris almost 6 years ago

  • Release set to 15
Actions

Also available in: Atom PDF