Project

General

Profile

Actions

Feature #4194

closed

[Keep] consistent logging for keepstore requests

Added by Tim Pierce over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Keep
Target version:
Story points:
2.5

Description

Currently, each handler is expected to implement its own logging. As a result, log formats are inconsistent, handlers get written without any logs at all, and at best there is much code repetition.

Instead, the router should implement a generic request/response logging system with the following features.
  • Every transaction uses the same format, e.g., "ipaddr resp_status resp_size elapsed uri err"
  • Individual handlers do not worry about logging or returning error codes -- they just set the response status using http.Error(), http.StatusOK, etc., and let the logging system figure out what to say by inspecting the request and response objects.
Implementation:
  • Perhaps this:
    •     // Tell the built-in HTTP server to direct all requests to the REST
          // router.
          http.Handle("/", MakeRESTRouter())
      
  • ...can be replaced with something like this?
    •     // The built-in HTTP server will pass each request to this function,
          // which forwards the request to the REST router and logs the outcome.
          router := MakeRESTRouter()
          http.HandleFunc("/", func(resp ResponseWriter, req *Request) {
              router.ServeHTTP(resp, req)
              // log stuff here!
          })
      

Subtasks 1 (0 open1 closed)

Task #4542: Review branch: 4194-keep-loggingResolvedTim Pierce11/16/2014Actions
Actions #1

Updated by Tim Pierce over 9 years ago

  • Tracker changed from Bug to Feature
  • Category set to Keep
  • Assigned To set to Tim Pierce
  • Story points set to 0.5

Keepstore logging has been made very consistent for GET and PUT requests. We should update /index and /status requests to log similarly.

Keepstore should also support a verbose or debugging logging mode that logs authentication failures, to make it easier to debug problems with data manager tokens etc.

Actions #2

Updated by Tom Clegg over 9 years ago

  • Description updated (diff)
Actions #3

Updated by Tim Pierce over 9 years ago

  • Target version changed from 2014-10-29 sprint to Arvados Future Sprints
Actions #4

Updated by Tom Clegg over 9 years ago

  • Assigned To deleted (Tim Pierce)
Actions #5

Updated by Tom Clegg over 9 years ago

  • Target version changed from Arvados Future Sprints to 2014-11-19 sprint
Actions #6

Updated by Tom Clegg over 9 years ago

  • Story points changed from 0.5 to 1.0
Actions #7

Updated by Tom Clegg over 9 years ago

  • Assigned To set to Radhika Chippada
Actions #8

Updated by Ward Vandewege over 9 years ago

  • Story points changed from 1.0 to 2.5
Actions #9

Updated by Tom Clegg over 9 years ago

ResponseWriter doesn't let you read the response status code later, but you could wrap it with a LoggingResponseWriter that does:

https://groups.google.com/forum/#!topic/golang-nuts/fxUz5WK6Hek

Actions #10

Updated by Tom Clegg over 9 years ago

  • Subject changed from [Keep] consistent logging for Keep requests to [Keep] consistent logging for keepstore requests
Actions #11

Updated by Tim Pierce over 9 years ago

Capturing the informal review we just did face-to-face at d4224bfe:

  • The WrapRESTRouter code could wrap the mux.Router directly inside MakeRESTRouter, and remove the need for any other code to know it needs to wrap the router in order to add logging.
  • The loggingWriter should make a copy of the data that it's being passed and not just store a reference to the slice. If we only care about the first 200 bytes of the response anyway, this could be something like:
type LoggingResponseWriter struct {
    data bytes.Buffer
    status int
    http.ResponseWriter
}

func (w *LoggingResponseWriter) Write(data []bytes) {
    if w.data.Len() < 200 {
        w.Write(data)
    }
    w.ResponseWriter.Write(data)
}

I had some worries about the way this code embeds http.ResponseWriter but after reviewing the docs more closely I'm comfortable that you're right. That's a nice approach.

Actions #12

Updated by Tim Pierce over 9 years ago

Reviewing at 0e5dfbb

This looks great. A couple of suggestions for organizing the code:

Let's put router_wrapper.go into its own package:
  • say, services/keepstore/router (so it can be imported as git.curoverse.com/arvados.git/services/keepstore/router
  • call the router type LoggingRouter instead of RESTRouterWrapper
  • rename MakeRESTRouterWrapper to New (so a program which has imported the router package invokes it as router.New()

That makes it easier to untangle this package when we want to import it into another Keep package, and also helps to explain what it is that we're doing with the router. LoggingRouter is more descriptive than RESTRouterWrapper.

Defining methods on a pointer to a struct rather than on the struct itself tends to make the code more consistent. There's a strong convention for doing this with struct types in particular. See https://golang.org/doc/effective_go.html#pointers_vs_values for more details. I'd like to follow this when there isn't a strong reason not to, i.e.:

func (wrapper *RESTRouterWrapper) ServeHTTP(resp http.ResponseWriter, req *http.Request) {

instead of

func (wrapper RESTRouterWrapper) ServeHTTP(resp http.ResponseWriter, req *http.Request) {

(Consequently, MakeRESTRouterWrapper should also return *RESTRouterWrapper)

Actions #13

Updated by Radhika Chippada over 9 years ago

Tim, addressed all your feedback, except the first one about moving the Logging router into its own package named "router".

When I tried to move this file into router package, the package git.curoverse.com/arvados.git/services/keepstore/router needed to import git.curoverse.com/arvados.git/services/keepstore package (for handler stuff) and the package git.curoverse.com/arvados.git/services/keepstore needed to import git.curoverse.com/arvados.git/services/keepstore/router (for logging router).

I do not (yet) know if this is even permissible or if go has circular references issues etc; but, it does not seem right doing it this way in the first place. I think if we really want to do this, we need to move more of the code into appropriate packages such as handler (handler related code), router, perms, etc while leaving just the "main" related stuff in keepstore package. It seems that at a minimum, we should move all but main related code into a keepstore/somename package and the router into keepstore/router package.

I am not sure if we are interested in going this path now? Let me know what you think. Thanks.

Actions #14

Updated by Tim Pierce over 9 years ago

Radhika Chippada wrote:

When I tried to move this file into router package, the package git.curoverse.com/arvados.git/services/keepstore/router needed to import git.curoverse.com/arvados.git/services/keepstore package (for handler stuff) and the package git.curoverse.com/arvados.git/services/keepstore needed to import git.curoverse.com/arvados.git/services/keepstore/router (for logging router).

Ugh. Okay, let's scrap this for now and add a ticket to move the router into its own package.

Actions #15

Updated by Radhika Chippada over 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:c621e2e901f5b5de4f72b1c6a9d8816aa5745292.

Actions #16

Updated by Tom Clegg over 9 years ago

Couple of comments at 07bf4bb

I think we should lose the "show first 200 bytes" stuff. There are various problems here which don't seem worth solving:
  • Writing arbitrary user-supplied binary data to log files is dangerous. It's easy for users to write data that looks like a real log message, sends control chars to an admin user's terminal, confuses post-processors, etc.
  • Current log format can make data indistinguishable from length ("120") (admittedly trivial to fix)
  • Buffering a copy of the entire ~64MiB response is wasteful

Instead we could just update a ResponseLength field during Write(), and print that in the log message.

(Is the "log content of small blocks" feature important for some reason I'm not thinking of?)

Formatting:
  • Golang says always use tabs to indent, never spaces.
  • Go will fix this for you: go fmt services/keepstore/logging_router.go
Actions #17

Updated by Radhika Chippada over 9 years ago

Tom, addressed your comments. Yes, you are right about printing data and I was also concerned about it, but lost track of it. Thanks for catching.

One question about the length versus data. The log message for PUT would be:

With response data string printed: PUT e4d909c290d0fb1ca068ffaddf22cbd0 200 e4d909c290d0fb1ca068ffaddf22cbd0+44

With length of data string printed: PUT e4d909c290d0fb1ca068ffaddf22cbd0 200 36

Here the log statement is printing the size of data string (e4d909c290d0fb1ca068ffaddf22cbd0+44), rather than the actual data size witch is 44.

Do we want to look for data ending with "+\d" and print the value of "\d" in these cases?

Thanks.

Actions #18

Updated by Tim Pierce over 9 years ago

Radhika Chippada wrote:

One question about the length versus data. The log message for PUT would be:

With response data string printed: PUT e4d909c290d0fb1ca068ffaddf22cbd0 200 e4d909c290d0fb1ca068ffaddf22cbd0+44

With length of data string printed: PUT e4d909c290d0fb1ca068ffaddf22cbd0 200 36

Here the log statement is printing the size of data string (e4d909c290d0fb1ca068ffaddf22cbd0+44), rather than the actual data size witch is 44.

Do we want to look for data ending with "+\d" and print the value of "\d" in these cases?

IMHO, no -- the +44 refers to the amount of data available at that block, not the amount of data returned in the HTTP request. We always want to log the latter.

Actions #19

Updated by Tom Clegg over 9 years ago

Radhika Chippada wrote:

With length of data string printed: PUT e4d909c290d0fb1ca068ffaddf22cbd0 200 36

Here the log statement is printing the size of data string (e4d909c290d0fb1ca068ffaddf22cbd0+44), rather than the actual data size witch is 44.

Looks good to me -- "number of bytes sent to the client" is a good number to have. For PUT this is pretty boring but for GET it's a bit more useful. When we add support for Range requests it'll get a bit more interesting.

Do we want to look for data ending with "+\d" and print the value of "\d" in these cases?

Nah... let's save this for another day but I think "bytes received from client" would be the appropriate thing to log if we wanted to log the data size for PUT requests.

I wonder if this should be += instead of =?

+       loggingWriter.Length = len(data)

Is there any reason not to print the Length field for non-200 status codes? (It will be boring, like PUT, but consistent and much better for post-processing.)

Actions #20

Updated by Radhika Chippada over 9 years ago

Great. I update "=" to "+=". I also updated to use the same log statement in both error and success cases. Thanks.

Actions #21

Updated by Tom Clegg over 9 years ago

b1f3c94 LGTM, thanks!

Actions

Also available in: Atom PDF