Project

General

Profile

Actions

Idea #1885

closed

Keep proxy service: forward external requests to the appropriate keep servers

Added by Tom Clegg over 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Start date:
05/08/2014
Due date:
Story points:
3.0

Description

Useful for external and federated clients.

Necessary:
  • Use API host and token provided in environment variables to get the list of Keep servers at startup.
  • Use node selection algorithm (see Python SDK) to select probe order when reading and writing.
  • Forward Authorization header to Keep servers.
  • PUT:
    • use 2x replication, i.e., write to 2 Keep servers (future: client specifies this in a header)
    • in response headers, say "X-Data-Replication: 2"
  • GET:
    • 405 if --no-GET flag was given.
    • For now: do not implement GET. Do not even start up unless --no-GET flag is given.
  • GET /index*:
    • 403
Important but not strictly a blocker:
  • PUT: verify Authorization header by requesting /users/current from apiserver, respond 401 if not valid.
Nice to have:
  • Periodically request the latest list of Keep servers. (This has an easy workaround: TERM the server as needed.)
  • HEAD:
    • 405 if --no-HEAD flag was given, otherwise forward to Keep servers in probe order until it succeeds, send first successful response (or last failure, if none) to client
Implementation
  • Probably best to do this with Go. Perhaps something like https://github.com/elazarl/goproxy or even plain net/http. (But definitely as a separate project, not by adding "proxy mode" to the Keep server itself.)

Subtasks 4 (0 open4 closed)

Task #2797: Write proxy testsResolvedPeter Amstutz05/08/2014Actions
Task #2796: Write proxy server sideResolvedPeter Amstutz05/08/2014Actions
Task #2775: Decide what infrastructure to build proxy service onResolvedPeter Amstutz05/08/2014Actions
Task #2852: Review 1885-keep-proxyResolvedWard Vandewege05/08/2014Actions
Actions #1

Updated by Tom Clegg about 10 years ago

  • Subject changed from Keep can be accessed from outside the cluster, e.g., using some sort of gateway or proxy to Keep federation: proxy service forwards external requests to the appropriate keep server(s)
Actions #2

Updated by Tom Clegg about 10 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg almost 10 years ago

  • Subject changed from Keep federation: proxy service forwards external requests to the appropriate keep server(s) to Keep proxy service forwards external requests to the appropriate keep servers
  • Description updated (diff)
  • Story points set to 3.0
Actions #5

Updated by Tom Clegg almost 10 years ago

  • Project changed from 35 to Arvados
  • Subject changed from Keep proxy service forwards external requests to the appropriate keep servers to Keep proxy service: forward external requests to the appropriate keep servers
Actions #6

Updated by Tom Clegg almost 10 years ago

  • Description updated (diff)
Actions #7

Updated by Tom Clegg almost 10 years ago

  • Target version set to 2014-05-28 Pipeline Factory
Actions #8

Updated by Tom Clegg almost 10 years ago

  • Description updated (diff)
  • Story points changed from 3.0 to 5.0
Actions #9

Updated by Peter Amstutz almost 10 years ago

  • Assigned To set to Peter Amstutz
Actions #10

Updated by Peter Amstutz almost 10 years ago

  • Story points changed from 5.0 to 3.0
Actions #11

Updated by Ward Vandewege almost 10 years ago

go.sh: you need to force GOPATH to be absolute, for example like this:

rootdir=$(dirname $(readlink -f $0))

otherwise, go blows up like this if you run go.sh from a symlinked path:

go: GOPATH entry is relative; must be absolute path: ".".
Run 'go help gopath' for usage.

go.sh: needs documentation for arguments required. Is go.sh build keep what you are intending it to be run as?

go.sh: does not seem to replicate build.sh's functionality - it does not put the keep binary in bin/. Or maybe I'm not calling it the right way, cf. the previous point.

services/keep/src/arvados.org/keepproxy/keepproxy.go: has some commented out code that should be removed

services/keep/src/arvados.org/keepproxy/keepproxy.go: don't hardcode https on line 165, that should be configurable (but default to https) or ideally auto-discovered

services/keep/src/arvados.org/keepproxy/keepproxy_test.go: same, don't hardcode https on line 74

services/keep/src/arvados.org/keepproxy/keepproxy_test.go: these two commands need to check for failure and print the command output if run_test_server.py fails:

exec.Command("python", "run_test_server.py", "start").Run()
exec.Command("python", "run_test_server.py", "start_keep").Run()

FYI - I'm planning to restrict the number of concurrent connections to the Keep Proxy at the nginx level.

Any thoughts on expected memory use patterns of keep proxy? I assume the worst case scenario (in put mode) would be many clients all writing full 64MiB blocks, slowly. That scenario is what the limit on concurrent connections will protect against.

Please write a log line for every GET/HEAD/PUT request to STDOUT. Something simple will do for now. In Keep we use timestamp method /hash at the moment, and that's plenty fine.

Other than that, I'd say this looks pretty good! Tests pass.

Actions #12

Updated by Peter Amstutz almost 10 years ago

  1. Added comment to go.sh. Go.sh is just a wrapper around the go toolchain so regular go commands apply.
  2. "go install arvados.org/keepproxy" should build keepproxy and copy it to bin/
  3. The commented out code was to force the user to disable GET, which in Tom's original writeup was not supposed to be implemented (but I did it anyway). I'm not clear on the status of keep permissions, in theory the proxy supports it but I haven't tested it.
  4. Discussed https when calling API server on IRC
  5. Checks error return on .Run() now in the tests
  6. Regarding memory usage:
    • GET requests should stream through with minimal buffering. It effectively goes directly from the http.Response reader from the Keep server directly into the http.ResponseWriter that sends to the client, with only a small stub that incrementally tallys the md5 checksum. The buffer used by the Go http library is something like 32 KiB
    • PUT requests pre-allocate a buffer based on the Content-Length in the PUT header, and that buffer is held until the block has been sufficiently replicated or it gives up after contacting all Keep servers. That means we probably do want a relatively conservative cap on the number of concurrent PUT requests allowed, e.g. 16 * 64 MiB will require 1 GiB of RAM.
  7. Added a bunch of logging, and uncovered a rather serious and stupid bug in not correctly failing on authorization errors :-)
Actions #13

Updated by Ward Vandewege almost 10 years ago

This looks good to merge to me; just one change please: also log requests that are 404 or fail in other ways. It can just be the same log line as for a successful request for now, unless you want to upgrade it a little bit and add the HTTP response code in the log line, that would be nice.

Actions #14

Updated by Anonymous almost 10 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:ca77ae5cb94c31232abc8923174280fd848ef408.

Actions

Also available in: Atom PDF