Project

General

Profile

Actions

Bug #7492

closed

[keepproxy] Report upstream server errors as 502 Bad Gateway instead of 404

Added by Peter Amstutz over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Story points:
1.0

Description

Keepproxy (which uses keepclient) is failing requests because one of the upstream servers returned "read tcp 10.26.0.7:25107: use of closed network connection". This is being returned to the client as a 404, but actually should be a 502 (Bad Gateway) which would trigger retry logic on the client.

Requirements

If an error occurs in (*keepclient.KeepClient)Get() then keepproxy's http response status must be:
  • 404 if every keepstore service returned 404
  • 422 if every keepstore service returned a non-retryable error (but some didn't return 404)
  • 502 if any keepstore service had still returned a retryable error when tries_remaining ran out

Also, the list of errors received from the upstream keepstore services should be propagated to keepproxy's client in the response body. (Currently this information is logged by the SDK, so it shows up in keepproxy's log file but it is not available to the originator of the request.)

Implementation

We need to update (*keepclient.KeepClient)Get() to make it possible for keepproxy to distinguish between the above outcomes.

keepclient:
  • create an Error interface (similar to https://golang.org/pkg/net/#Error)
    • type Error struct {
          error
          Temporary() bool // Is the error temporary?
      }
  • create a multipleResponseError type that implements Error
    • type multipleResponseError struct {
          error
          isTemp bool
      }
      func (e *multipleResponseError) Temporary() bool {
          return e.isTemp
      }
      
  • create a NotFoundError type that implements Error
    • type NotFoundError struct {
          multipleResponseError
      }
      
  • Change BlockNotFound to a NotFoundError
    • var BlockNotFound = &NotFoundError{multipleResponseError{
          error:  errors.New("Block not found"),
          isTemp: false,
      }}
      
  • in Get(), count how many responses are http.StatusNotFound
  • when returning from Get() after all attempts have failed, make a NotFoundError or a multipleResponseError:
    • 
      if count404 == len(serversToTry) {
          err = BlockNotFound
      } else {
          err = &multipleResponseError{
              error:    fmt.Errorf("%s %s failed: %v", method, locator, errs),
              isTemp: len(serversToTry) > 0,
          }
      }
      

(Defer: add a way for clients to see a slice of all errors encountered, or at least the last error from each server. For now, that information is only available as a string, via the usual Error() method.)

keepproxy:
  • Do a type select on the error returned from keepclient.Get() to determine the appropriate HTTP response
    • 
      switch err := err.(type) {
      case keepclient.NotFoundError:
          // respond 404
      case keepclient.Error:
          if err.Temporary() {
              // respond 502
          } else {
              // respond 422
          }
      default:
          // respond 500
      }
      
  • In all cases, return err.Error() in the response body.

Subtasks 1 (0 open1 closed)

Task #7557: Review branch: 7492-keepproxy-upstream-errorsResolvedRadhika Chippada10/14/2015Actions

Related issues

Related to Arvados - Bug #7491: [SDK] Go keepclient should retry on network errorsResolvedPeter Amstutz10/08/2015Actions
Actions #1

Updated by Peter Amstutz over 8 years ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith over 8 years ago

  • Target version set to 2015-10-28 sprint
Actions #4

Updated by Radhika Chippada over 8 years ago

  • Assigned To set to Radhika Chippada
Actions #5

Updated by Peter Amstutz over 8 years ago

keepclient.go:144 func getOrHead()

If the client was unable to get a final answer from one or more servers (either due to network failure or the server returned a HTTP 408, 429, or 500+ error) it should not return BlockNotFound, it should return some other error. Keepproxy probably does not need to be changed, because it already responds 502 Gateway error as the default case for errors other than BlockNotFound.

Comments from Tom on another #7491:

  • Return BlockNotFound only if all servers succeeded and replied exactly 404.
  • If not returning BlockNotFound, return something like fmt.Errorf("%+v", errs) so the caller can see/report what went wrong.

Unresolved questions (ask Tom):

  • What about 401 (Unauthorized) and 403 (Permission denied)?
    • If the same error is returned from every server, that's probably the one that should be returned, but need to define more error types and handle them in keepproxy.
  • What is the correct response if the query gets a mix of 401, 403 and/or 404 errors?
    • Could happen if the site has a mix of servers with permissions enabled or disabled.
Actions #6

Updated by Peter Amstutz over 8 years ago

  • In keepclient.go, define a new struct KeepRequestError
    • This should be equivalent to the arvados.errors.KeepRequestError in the Python SDK
    • Should contain an map of keep servers with the last error received resulting from contacting each keep server
  • getOrHead() should return BlockNotFound if all servers replied http 404.
  • getOrHead() should return KeepRequestError for any other error
  • In keepproxy.go, GetBlockHandler.ServeHTTP, add a check for KeepRequestError. This should return a HTTP 422 status code and return a formatted text string of KeepRequestError in the body.
Actions #7

Updated by Tom Clegg over 8 years ago

  • Description updated (diff)
Actions #8

Updated by Tom Clegg over 8 years ago

  • Description updated (diff)
Actions #9

Updated by Radhika Chippada over 8 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Radhika Chippada over 8 years ago

All the details from the description are implemented as suggested. One exception:

"In all cases, return err.Error() in the response body"

Setting the body with error message is interfering with "http.Error(resp, err.Error(), status)" in the defer func (Panic: runtime error: invalid memory address or nil pointer dereference). This is resulting in errors in KeepClient which is expecting error and nil reader etc. So, I have not implemented this. Please clarify requirement around this.

Actions #11

Updated by Peter Amstutz over 8 years ago

Reviewing 7492-keepproxy-upstream-errors:

  • Is there a reason why you are using
    respErr.Error() == "Block not found'
    instead of
    respErr == keepclient.BlockNotFound
    or
    c.Check(err, Equals, keepclient.BlockNotFound)
    ?
  • Instead of calling kc.getSortedRoots(locator) twice, suggest doing this in getOrHead():
count404 := len(serversToTry)
...
if resp.StatusCode == 404 {
  count404--
}
...
if count404 == 0 {
}
  • Can we add a test to keepproxy that checks for permanent and temporary errors from the perspective of the client talking to the proxy?
Actions #12

Updated by Radhika Chippada over 8 years ago

Is there a reason why you are using respErr.Error() "Block not found'

Updated to use respErr keepclient.BlockNotFound and c.Check(err, Equals, keepclient.BlockNotFound). Much better now.

Instead of calling kc.getSortedRoots(locator) twice, suggest doing this in getOrHead() ...

Removed the unnecessary getSortedRoots call and used numServers variable instead. I did not want to make the change to count404-- as you suggested. Doing so results in code such as if count404 == 0 { set status code to 404 }, which can be quite counter-intuitive even if I use a better variable name :)

Can we add a test to keepproxy that checks for permanent and temporary errors from the perspective of the client talking to the proxy?

Added one more test set to do Get with a bad token and expired token. If you have any other specific tests in mind, please let me know.

Thanks.

Actions #13

Updated by Peter Amstutz over 8 years ago

Radhika Chippada wrote:

Is there a reason why you are using respErr.Error() "Block not found'

Updated to use respErr keepclient.BlockNotFound and c.Check(err, Equals, keepclient.BlockNotFound). Much better now.

Instead of calling kc.getSortedRoots(locator) twice, suggest doing this in getOrHead() ...

Removed the unnecessary getSortedRoots call and used numServers variable instead. I did not want to make the change to count404-- as you suggested. Doing so results in code such as if count404 == 0 { set status code to 404 }, which can be quite counter-intuitive even if I use a better variable name :)

Thanks.

Can we add a test to keepproxy that checks for permanent and temporary errors from the perspective of the client talking to the proxy?

Added one more test set to do Get with a bad token and expired token. If you have any other specific tests in mind, please let me know.

Yes, can we have a test where keepproxy generates a Temporary() error? The original goal of this story was for keepproxy to start returning 502 Bad Gateway when one of the keep servers behind the proxy failed in a "temporary" way so as to trigger retry behavior on the external client.

Actions #14

Updated by Radhika Chippada over 8 years ago

Yes, can we have a test where keepproxy generates a Temporary() error ...

Added another test to keepproxy_test with "connection refused" temporary error. Also, improved test assertions in keepclient_test. Thanks.

Actions #15

Updated by Peter Amstutz over 8 years ago

Radhika Chippada wrote:

Yes, can we have a test where keepproxy generates a Temporary() error ...

Added another test to keepproxy_test with "connection refused" temporary error. Also, improved test assertions in keepclient_test. Thanks.

Unfortunately, that is not testing what I was looking to be tested. This is testing whether a failed connection to the proxy results in a Temporary error. I'm asking for a test where the client can connect to the proxy, but the proxy can't connect to the keep node.

Actions #16

Updated by Peter Amstutz over 8 years ago

0bb6625 looks good to me.

Actions #17

Updated by Radhika Chippada over 8 years ago

  • Story points set to 1.0
Actions #18

Updated by Radhika Chippada over 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:b90d3ea4986a1af59395f223e0320f07cc91c272.

Actions

Also available in: Atom PDF