Bug #10467

[Keepstore] abandon work when client hangs up

Added by Ward Vandewege almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
11/07/2016
Due date:
% Done:

86%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Subtasks

Task #10475: Review 10467-client-disconnectResolvedTom Clegg

Task #10509: Add testsResolvedTom Clegg

Task #10697: Update Azure driverResolvedTom Clegg

Task #10637: Review 10467-client-disconnectResolvedTom Clegg

Task #10698: Update local disk driverResolvedTom Clegg

Task #10731: Review 10467-client-disconnectResolvedPeter Amstutz

Task #10754: Move ctx-cancel tests to generic suiteClosedTom Clegg


Related issues

Related to Arvados - Bug #10794: [keepstore] Refactor volume Get/Put funcs to BlockReader/BlockWriter interfacesNew01/03/2017

Is duplicate of Arvados - Story #8061: [Keep] keepstore checks if a GET client is still there before loading a block, and before sending itDuplicate12/18/2015

Associated revisions

Revision fd26f550
Added by Tom Clegg almost 3 years ago

Merge branch '10467-client-disconnect' refs #10467

Revision 9b0654ad
Added by Tom Clegg almost 3 years ago

Merge branch '10467-client-disconnect' refs #10467

Revision cafa08ea
Added by Tom Clegg almost 3 years ago

Merge branch '10467-client-disconnect' refs #10467

Revision b405f0f4
Added by Tom Clegg almost 3 years ago

Merge branch '10467-client-disconnect' closes #10467

History

#1 Updated by Tom Clegg almost 3 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg

work in progress on 10467-client-disconnect

#2 Updated by Tom Clegg almost 3 years ago

10467-client-disconnect -- test 9411197dfa8ff4c7d935a395a04b5846c7b52ffd

(note -- this is based on a not-yet-merged 10468 branch)

#3 Updated by Tom Clegg almost 3 years ago

  • Target version set to 2016-11-09 sprint

#4 Updated by Peter Amstutz almost 3 years ago

  • Uses context.TODO() but documentation says to use context.Background() (https://golang.org/pkg/context/)
  • Unix and Azure blob volume doesn't support canceling Get() (future story)
  • I feel like we would benefit from an additional abstraction here, something like a "cancellable pipe" which implements the behavior of passing data through or on cancel abruptly terminates the current read or write. This would make cancellation simpler because by exposing it as as IO read or write errors.
  • I don't see any tests that cancel anything?

#5 Updated by Tom Clegg almost 3 years ago

Peter Amstutz wrote:

Ah yes, fixed tests and pull_worker. Handlers still use TODO "because the surrounding function has not yet been extended to accept a Context parameter".

  • I feel like we would benefit from an additional abstraction here, something like a "cancellable pipe" which implements the behavior of passing data through or on cancel abruptly terminates the current read or write. This would make cancellation simpler because by exposing it as as IO read or write errors.

I think what you're looking for is io.Pipe() (see "CloseWithError") which I used in s3 PUT to ensure a stalled s3 operation can't wake up later and resume use of a buffer that has long since been repurposed for a different request.

We could move that up a layer if we refactored our Volume interface to use readers and writers instead of passing a buffer. Is that what you mean? (It might be tricky to implement the "concurrent GET requests" Azure volume feature without using 2x memory, but then again, maybe that concurrency feature should also be moved up a layer anyway, and volumes should just support range requests.)

  • I don't see any tests that cancel anything?

Yeah, that'd be good.

#6 Updated by Tom Clegg almost 3 years ago

  • Target version changed from 2016-11-09 sprint to 2016-11-23 sprint

#7 Updated by Tom Clegg almost 3 years ago

  • Story points set to 0.5

#8 Updated by Tom Clegg almost 3 years ago

  • Target version changed from 2016-11-23 sprint to 2016-12-14 sprint

#9 Updated by Tom Clegg almost 3 years ago

Added tests for Get, Put, Compare.

10467-client-disconnect - test e675118bd2b28ec40833d06ea384b6f1c78f3039

#10 Updated by Peter Amstutz almost 3 years ago

Tests LGTM.

#11 Updated by Tom Clegg almost 3 years ago

  • Category set to Keep
  • Target version changed from 2016-12-14 sprint to 2017-01-04 sprint
  • Story points changed from 0.5 to 1.0

#12 Updated by Tom Clegg almost 3 years ago

Azure implementation on 10467-client-disconnect @ d0dc273718a2f657643a2b2800d984c7a3a62f78

#13 Updated by Peter Amstutz almost 3 years ago

Tom Clegg wrote:

Azure implementation on 10467-client-disconnect @ d0dc273718a2f657643a2b2800d984c7a3a62f78

This LGTM.

#14 Updated by Tom Clegg almost 3 years ago

10467-client-disconnect @ 568c7abf660b7a68f70b6ea47ae2e7352233f053 https://ci.curoverse.com/job/developer-run-tests/121/

If this goes well, next the Azure and S3 drivers can be refactored to use these getWithPipe() and putWithPipe() funcs. That should simplify them, and pave the way to changing the Volume interface to use io.PipeReader and io.PipeWriter instead of passing bufs around.

#15 Updated by Peter Amstutz almost 3 years ago

The approach of passing functions (getWithPipe, get, getFunc) into other functions in order to wrap the behavior is hard to follow and feels a bit like Ruby written in Go. I think this would be a bit more Go-like to return an io.Reader and have the Close() method take care of calling unlock(), similar to errorReaderCloser. E.g.

type volumeUnlockReader struct {
    *io.ReaderCloser
    v *UnixVolume
}

func (rw *volumeUnlockReader) Close() err {
    defer rw.v.unlock()
    return rw.Reader.Close()
}

func (v *UnixVolume) getBlockReader(ctx context.Context, path string) (io.Reader, error) {
    if err := v.lock(ctx); err != nil {
        return nil, err
    }
    f, err := os.Open(path)
    if err != nil {
        v.unlock()
        return nil, err
    }
    return &ReaderWrapper{f, v}, nil
}

func (v *UnixVolume) get(ctx context.Context, loc string, w *io.PipeWriter) {
    path := v.blockPath(loc)
    stat, err := v.stat(path)
    if err != nil {
        w.CloseWithError(v.translateError(err))
        return
    }
    rdr, err = v.getBlockReader(ctx, path)
    if rdr != nil {
        defer rdr.Close()
    }
    if err != nil {
        return
    }
    _, err = io.CopyN(w, rdr, n)
    w.CloseWithError(err)
}

If I'm following, the road map here is:

  1. Convert all the volumes to use getWithPipe/putWithPipe
  2. Change the volume interface to Get(ctx context.Context, loc string, w *io.PipeWriter) and Put(ctx context.Context, loc string, rdr io.ReadCloser) and invoke getWithPipe/putWithPipe in the handler
  3. Change getWithPipe/putWithPipe to take a target Writer or Reader instead of a buffer so that we can stream directly to/from the socket.

Is that about right?

#16 Updated by Tom Clegg almost 3 years ago

Peter Amstutz wrote:

The approach of passing functions (getWithPipe, get, getFunc) into other functions in order to wrap the behavior is hard to follow

Yeah, the focus is on making it easy to write/maintain N "getter" functions.

When all volumes have getters (or a Get() that returns a WriterTo) this glue can move up the stack and won't have to adapt one interface to another any more.

If I'm following, the road map here is:

  1. Convert all the volumes to use getWithPipe/putWithPipe
  2. Change the volume interface to Get(ctx context.Context, loc string, w *io.PipeWriter) and Put(ctx context.Context, loc string, rdr io.ReadCloser) and invoke getWithPipe/putWithPipe in the handler
  3. Change getWithPipe/putWithPipe to take a target Writer or Reader instead of a buffer so that we can stream directly to/from the socket.

Is that about right?

Yes, except getWithPipe/putWithPipe will probably just take a *Volume, and call the appropriate Get/Put method.

#17 Updated by Tom Clegg almost 3 years ago

Updated to use io.Writer and io.Reader interfaces instead of *io.PipeWriter and *io.PipeReader, so getter/putter funcs are more like io.WriterTo and io.ReaderFrom.

Removed some debug printf.

b4d9dfe1e7acb1f45c2cc699020bf9299a0db5c9

#18 Updated by Tom Clegg almost 3 years ago

... 3f556ca1b44b7e01874bd172abbb7cb3df0615db with BlockWriter and BlockReader interfaces instead of passing funcs around.

#20 Updated by Tom Clegg almost 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 71 to 100

Applied in changeset arvados|commit:b405f0f487f35f62d8362dc06981b83176b77d44.

Also available in: Atom PDF