Feature #7200
closed[Keep] keepproxy supports "index" API
Description
- Unless a complete index is retrieved for every service (i.e., http 200 and terminated by a single blank line), the index is incomplete, and keepproxy's response must not end with a blank line.
- If a block appears 2x in each of N keepstore "index" responses, it appears 2N times in keepproxy's response, too.
- If the index is complete, it must end with a single blank line.
- The index is given in whatever order is convenient for keepproxy (the caller does not expect it to be sorted).
Implementation¶
New public function in sdk/go/keepclient:
// GetIndex retrieves a list of blocks stored on the given server whose hashes // begin with the given prefix. The returned reader will return an error (other // than EOF) if the complete index cannot be retrieved. This should only be // expected to return useful results if the client is using a "data manager token" // recognized by the Keep services. func (kc *KeepClient) GetIndex(keepServiceUUID, prefix string) (io.Reader, error) { ... }
Tests for this function must include "caller sees error if partial index is retrieved from keepstore" (i.e., no blank line before EOF).
Some of the existing code in Data Manager may be useful here. (Data Manager might also deserve to be updated to use this new SDK feature instead of having duplicate code, but this might wait for a separate branch/story.)
Related issues
Updated by Ward Vandewege about 9 years ago
- Target version changed from Arvados Future Sprints to 2015-09-30 sprint
Updated by Brett Smith about 9 years ago
- Target version changed from 2015-09-30 sprint to Arvados Future Sprints
Updated by Radhika Chippada about 9 years ago
- Status changed from New to In Progress
- Assigned To set to Radhika Chippada
Updated by Radhika Chippada about 9 years ago
- Added GetIndex method support in keepclient and added several tests.
- Added IndexHandler in keepproxy. Used the keepclient method to implement this handler. Added tests.
- Added a couple more tests to keepstore/handler_test.go to verify response with valid but no such prefix and invalid prefix format.
- Pending question: In keepclient how should this comment be addressed: "This should only be expected to return useful results if the client is using a data manager token recognized by the Keep services" ?
Updated by Radhika Chippada about 9 years ago
- Target version changed from Arvados Future Sprints to 2015-09-30 sprint
Updated by Tom Clegg about 9 years ago
At eaf51fc
ErrorNoSuchKeepServer
and ErrorIncompleteIndex
should be ErrNoSuchKeepServer
and ErrIncompleteIndex
to match Go conventions (we already have some other offenders, but we should at least stick to Go idioms for new stuff).
ErrorGetIndex
seems to mean "no information except that the real error was non-nil and was encountered in GetIndex". Seems like we should just return err
itself in these three cases.
We shouldn't print debug logs from SDK methods. Just return an error, and let the application decide how to handle it.
Keeping the whole index in RAM isn't ideal but I think we can live with it for now, and refactor it later as a decorator that passes data from the underlying reader but still notices/interferes if the termination isn't correct. Meanwhile I think we should at least avoid making string copies of the byte slice:-if (!strings.HasSuffix(string(respBody), "\n\n")) && (string(respBody) != "\n") +if !bytes.Equal(respBody, []byte("\n")) && !bytes.HasSuffix(respBody, []byte("\n\n"))
-return strings.NewReader(string(respBody)), nil +return bytes.NewReader(respBody), nil
Since the only purpose of the trailing newline is to detect an incomplete read even though HTTP can't signal errors encountered after the body starts sending (and Go readers don't have this limitation) I think GetIndex should strip the trailing newline from the data returned to the caller. That way the caller can use something like https://golang.org/pkg/bufio/#Reader.ReadString to read the data, without any special handling for the trailing newline.
It doesn't look like it will make any difference, but for the sake of realism the first two StubIndexHandlers' data (the "complete" ones) should have size and timestamp, perhaps like:- []byte(string(hash) + "\n\n") + []byte(hash + "+3 1443559274\n\n")
- (that
string(hash)
looks superfluous -- hash is already a string, right?)
In IndexHandler, unroll the error handling:
- switch req.Method {
- case "GET":
- (normal flow)
- default:
- (handle error)
- return
- }
+ if req.Method != "GET" {
+ (handle error)
+ return
+ }
+ (normal flow)
There are two if err != nil { break }
which seem like they should say status = http.StatusBadGateway; return
instead -- that way we wouldn't need the switch err
at the end of the func.
Rather than buffering all of the responses in RAM, we could just do resp.Write(readBytes)
as soon as we have a complete index. (Aside from using less RAM, this might let us notice sooner and stop doing work if the client hangs up before we've fetched all indexes.)
Typo "implemenation" in keepproxy.go comment
The "GetIndex with prefix" test in keepproxy_test.go doesn't seem to check whether the results are actually restricted to the given prefix. It looks like we'll need to write another block in order to get different results from "GetIndex with no prefix" and "GetIndex with prefix". Granted, the test right after it ("valid but no such prefix") is fairly convincing, but still...
Updated by Tom Clegg about 9 years ago
Radhika Chippada wrote:
- Pending question: In keepclient how should this comment be addressed: "This should only be expected to return useful results if the client is using a data manager token recognized by the Keep services" ?
This doesn't require any new code, it was just meant as a hint to the SDK user that the API is only intended for admin/system use: "normal" users/tokens won't be able to get anything useful out of GetIndex
. For example, if someone is hunting through the godoc pages for an API like "get a list of all blocks I can read" (an API that doesn't exist at all right now) it would be nice to avoid wasting their time trying to do it with GetIndex
.
Suggestions for improving wording?
Updated by Radhika Chippada about 9 years ago
ErrorNoSuchKeepServer and ErrorIncompleteIndex should be ErrNoSuchKeepServer ...
Done
ErrorGetIndex seems to mean "no information ...
Done
We shouldn't print debug logs from SDK methods. Just return an error, and let the application decide how to handle it.
Done
Keeping the whole index in RAM isn't ideal but I think we can live with it for now, and refactor it later as a decorator that passes data from the underlying reader but still notices/interferes if the termination isn't correct. Meanwhile I think we should at least avoid making string copies of the byte slice:
Done
Since the only purpose of the trailing newline is to detect an incomplete read even though HTTP can't signal errors encountered after the body starts sending (and Go readers don't have this limitation) I think GetIndex should strip the trailing newline from the data returned to the caller. That way the caller can use something like https://golang.org/pkg/bufio/#Reader.ReadString to read the data, without any special handling for the trailing newline.
Done. However, this now needed me add an extra new line in keepproxy. Long story short: LocalRoots includes keepservers as well as keepproxy urls. Hence, when keepclient tries to GetIndex from the proxy, it needs to terminate with "\n\n" and hence stripping it in keepclient (alas) makes keepproxy response to be "incomplete". What fun.
It doesn't look like it will make any difference, but for the sake of realism the first two StubIndexHandlers' data (the "complete" ones) should have size and timestamp, perhaps like:
Done
(that string(hash) looks superfluous -- hash is already a string, right?)
Done. Huh.
In IndexHandler, unroll the error handling:
Done. So much better now.
There are two if err != nil { break } which seem like they should say status = http.StatusBadGateway; return instead -- that way we wouldn't need the switch err at the end of the func.
Done
Rather than buffering all of the responses in RAM, we could just do resp.Write(readBytes) as soon as we have a complete index. (Aside from using less RAM, this might let us notice sooner and stop doing work if the client hangs up before we've fetched all indexes.)
Done
Typo "implemenation" in keepproxy.go comment
Done
The "GetIndex with prefix" test in keepproxy_test.go doesn't seem to check whether the results are actually restricted to the given prefix. It looks like we'll need to write another block in order to get different results from "GetIndex with no prefix" and "GetIndex with prefix". Granted, the test right after it ("valid but no such prefix") is fairly convincing, but still...
Done. Improved the tests to also check for "otherLocators" when no prefix is used and only expected locators are returned when a prefix is specified.
This doesn't require any new code, it was just meant as a hint to the SDK user that the API is only intended for admin/system use: "normal" users/tokens won't be able to get anything useful out of GetIndex. For example, if someone is hunting through the godoc pages for an API like "get a list of all blocks I can read" (an API that doesn't exist at all right now) it would be nice to avoid wasting their time trying to do it with GetIndex. Suggestions for improving wording?
It makes sense now. But, I think instead of "This should only be expected to return useful results ..." => "This will only return useful results ... " might make it easier to understand. wdyt?
Updated by Tom Clegg about 9 years ago
Noticed another thing: I think we can remove all of the stuff about contentLen in (IndexHandler)ServeHTTP()
. By the time we call resp.Header().Set()
, the header has already been sent to the client, so it just goes nowhere. (Also, the trailing-newline check makes it unnecessary for the client to compare the Content-Length header with the actual number of bytes received.)
In GetIndex, resp.Body != nil
is always true when we check it, according to docs -- "When err is nil, resp always contains a non-nil resp.Body." ...so we don't need that condition.
(re IRC question) yes, I think we need a defer resp.Body.Close()
right before ioutil.ReadAll(resp.Body)
It makes sense now. But, I think instead of "This should only be expected to return useful results ..." => "This will only return useful results ... " might make it easier to understand. wdyt?
That sounds better. Or how about: "This is meant to be used only by system components and admin tools. It will return an error unless [...]"? Might help to make it a separate paragraph too.
Updated by Radhika Chippada about 9 years ago
Addressed all the comments from note #10 as well. Thanks.
Updated by Tom Clegg about 9 years ago
c.Assert(totalLocators, Equals, matchingLocators)
-- shouldn't totalLocators be greater than matchingLocators in this case?- It would be nice to use the same logic in both "GetIndex, then count results" loops. Currently one uses matchingLocators/otherLocators, the other uses totalLocators/matchingLocators. Perhaps something like
for _, spec := range []struct{ prefix string expectTestHash bool expectOther bool }{ {"", true, true}, {"abcdef", false, false}, {hash[:3], true, false}, } { indexReader, err := kc.GetIndex("proxy", spec.prefix) ... gotTestHash := 0 gotOther := 0 for _, locator := range locators { c.Check(locator[:len(spec.prefix)], Equals, spec.prefix) if locator[:32] == hash { gotTestHash++ } else { gotOther++ } } c.Check(gotTestHash == 2, Equals, spec.expectTestHash) c.Check(gotOther > 0, Equals, spec.expectOther) }
Updated by Tom Clegg about 9 years ago
Ooh. Assuming this is as easy as it looks, how about:
- var readBytes []byte
- readBytes, err = ioutil.ReadAll(reader)
- if err != nil {
- status = http.StatusBadGateway
- return
- }
-
- // Got index for this server; write to resp
- _, err := resp.Write(readBytes)
+ _, err := io.Copy(resp, reader)
Updated by Tom Clegg about 9 years ago
Oops, I was wrong about the position of defer resp.Body.Close()
in GetIndex
in keepclient/keepclient.go
. It should be right after we find out err != nil
. (If we get err==nil
and resp.StatusCode!=OK
we still have to close resp.Body
).
Rest LGTM, thanks!
Updated by Radhika Chippada about 9 years ago
- The test update with code reuse is much better. Used this with minor updates and additional comments.
- Yes, io.Copy worked.
- Updated the placement of resp.Body.Close()
- Also, did a few minor updates around variable names and comments.
Thanks.
Updated by Radhika Chippada about 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:6c31bea91f136bb300376847ced1fd965e037dd3.