Story #2769

[Keep] Keep supports DELETE requests

Added by Tim Pierce about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Tim Pierce
Category:
Keep
Target version:
Start date:
07/25/2014
Due date:
% Done:

100%

Estimated time:
(Total: 22.00 h)
Story points:
3.0

Description

New behavior:
  • When DELETE request is received, verify token in Authorization header
    • call arvados.v1.users.current method and confirm is_admin==true
    • call arvados.v1.api_client_authorizations.get and confirm the token's scopes contains "all" [1]
  • Maintain a cache {token → (is_admin, verified_timestamp)} with configurable TTL, so a series of N delete requests doesn't result in N token verifications.
  • If the token passes verification, delete all copies of the specified block from all (non-read-only) volumes.
  • Return value:
    • Respond 200 OK when the requested block was found on a local volume, with body like {"copies_deleted":2,"copies_not_deleted":1} (this would mean one copy was found on a read-only volume, two copies were found on writable volumes).
    • Respond 404 Not Found if no blocks present at all (i.e. {"copies_deleted":0,"copies_not_deleted":0}
    • Respond 403 Forbidden if the user is not allowed to delete blocks for some reason (additional work here has been moved to #3483)
Race conditions:
  • Refuse to delete a blob that has been PUT recently (i.e., age less than permission signature TTL). Enforcing this will involve tracking "most recent PUT" for each file, perhaps by updating modification timestamps.
    • Respond 422 if the target of an otherwise valid DELETE request is too new.
Configuration:
  • -no-delete flag disables DELETE functionality. Respond to valid DELETE requests with 405 (method not allowed) instead of deleting anything.
  • -token-cache-ttl argument specifies maximum age of token cache. Do not use a cache value older than this. (But do not bother with aggressive garbage collection.)
  • -token-cache-size argument specifies maximum number of entries in token cache. Delete oldest entry if the cache exceeds this size.
Notes:
  • The same token cache will also be useful in the future for things like enforcing storage quotas, so please make it easy to add fields to the cache values.

1 If this turns into a big deal, skip it for now.


Subtasks

Task #3273: Add configuration flags for DELETE.ResolvedTom Clegg

Task #3271: Implement DELETE requestResolvedTim Pierce

Task #3422: Review 2769-keep-delete-requestResolvedMisha Zatsman

Task #3533: Review 2769-disable-delete-flagResolvedPeter Amstutz

Associated revisions

Revision 6a6fe8d1 (diff)
Added by Tim Pierce almost 6 years ago

2769: reorganize REST handlers

Reorganizing REST handlers into their own source file.

Refs #2769.

Revision 95d963b2
Added by Tim Pierce almost 6 years ago

Merge branch '2769-keep-delete-request'

Refs #2769

Revision fb460bdf (diff)
Added by Tim Pierce almost 6 years ago

2769: code review comments

  • Updated TODO in handler_test.go.
  • Log DELETE requests in DeleteHandler.

Refs #2769

Revision fe5556e0 (diff)
Added by Tim Pierce almost 6 years ago

2769: code review comments

  • DeleteHandler returns http.StatusNotFound when no blocks could be found at all,
    and http.StatusMethodNotAllowed when all delete attempts
    fail (e.g. read-only volumes)
  • Unit test cleanup: "http://localhost:25107" is not actually necessary
    to build a http.Request, and it is confusing, so leave it out.

Refs #2769.

Revision 52192560 (diff)
Added by Tim Pierce almost 6 years ago

2769: more code review comments

  • Abandon HTTP 405, return HTTP 200 whenever any blocks were found at
    all
  • Return a JSON message only with HTTP 200 (404 implies
    copies_deleted=0, copies_failed=0)
  • More clarity in unit tests

Refs #2769

Revision b789ede3
Added by Tim Pierce almost 6 years ago

Merge branch '2769-keep-delete-request'

Refs #2769.

Revision db783c02 (diff)
Added by Tim Pierce almost 6 years ago

2769: ask API server for user's admin status

api_client.go implements new methods IsAdmin and HasUnlimitedScope,
which use the sdk.ArvadosClient to consult the API server.

  • IsAdmin(api_token): returns true if is_admin=true in the user record
    associated with this api_token, false otherwise.
  • HasUnlimitedScope(api_token): returns true if the "scopes" field in
    this token's api_client_authorizations includes the string "all",
    false otherwise.
  • CanDelete now checks any non-DataManager token and returns true if
    both IsAdmin and HasUnlimitedScope return true.

api_client_test.go tests the methods in api_client.go by calling them
against a fake API server in a goroutine.

Refs #2769.

Revision 6d0b4ad4
Added by Tom Clegg almost 6 years ago

Merge branch '2769-disable-delete-flag' closes #2769

Revision d2469485 (diff)
Added by Tim Pierce almost 6 years ago

2769: cache API tokens.

The api_client wrapper functions cache user information retrieved from
the API server (the User and ApiClientAuthorization records).

TestTokenCache is a unit test that exercises the cache to demonstrate
that when a token is found in the cache but not in the API server, the
cached value is returned.

The token_cache_ttl command-line flag sets the TTL for cached
tokens. The default is one hour.

Refs #2769.

History

#1 Updated by Tom Clegg almost 6 years ago

  • Story points set to 2.0

#2 Updated by Radhika Chippada almost 6 years ago

  • Subject changed from Keep supports DELETE requests to [KEEP] Keep supports DELETE requests
  • Category set to Keep

#3 Updated by Radhika Chippada almost 6 years ago

  • Subject changed from [KEEP] Keep supports DELETE requests to [Keep] Keep supports DELETE requests

#4 Updated by Tom Clegg almost 6 years ago

  • Description updated (diff)
  • Story points changed from 2.0 to 3.0

#5 Updated by Tom Clegg almost 6 years ago

  • Description updated (diff)

#6 Updated by Tom Clegg almost 6 years ago

  • Target version set to Arvados Future Sprints

#7 Updated by Tim Pierce almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2014-08-06 Sprint

#8 Updated by Tom Clegg almost 6 years ago

  • Description updated (diff)

#9 Updated by Tim Pierce almost 6 years ago

  • Assigned To set to Tim Pierce

#10 Updated by Tim Pierce almost 6 years ago

Two observations:

  • This story makes reference to read-only volumes, but currently Keep has no notion of a read-only volume. This may not make a difference in practice -- the blob server can still keep a count of volumes for which the delete attempt failed, and report that count to the user -- but I'm noting it here in case we need a story for Keep to distinguish between read-only and read-write volumes.
  • If the blob server is configured to serialize i/o on volumes, should it also serialize DELETE requests, or can those be executed asynchronously?

#11 Updated by Tim Pierce almost 6 years ago

  • Status changed from New to In Progress

#12 Updated by Tim Pierce almost 6 years ago

Question: is it appropriate for Keep to return 200 OK if the block could not be found at all, or if no copies could be deleted (i.e. only on read-only volumes)?

We could return e.g.
  • 404 Not Found if the block is not present on any volume
  • 405 Method Not Allowed if the block is present but no copies could be deleted

#13 Updated by Tim Pierce almost 6 years ago

Branch 2769-keep-delete-request is ready for review @ b9d9b77

This branch implements basic DELETE support for Keep:

The DeleteHandler method implements DELETE requests for Keep, checking that the user is authorized to issue DELETE and then deleting the requested block from local volumes.

  • CanDelete checks that the user with the given token is authorized to delete blocks from Keep.
  • MockVolume and UnixVolume objects provide a Delete method for deleting a block from that volume.
  • TestDeleteHandler tests:
    • Unauthenticated requests
    • Requests from a non-admin user
    • Requests from a non-admin user for a nonexistent block
    • Requests from a non-admin user for an existing block
Not yet implemented:
  • Checking token status with API server
  • Verifying block signatures
  • Token cache

#14 Updated by Tom Clegg almost 6 years ago

Without claiming to have properly reviewed this, I noticed a couple of minor things:
  • Do we log successful DELETE requests somewhere? (I see where GET and PUT are logged, but not DELETE, index, or status... maybe those are missing too?)
  • In TODO comment, for "delete block on read-only volume": I think there are two cases here: (1) delete block that we have two copies of, one on a read-only volume and one on a writable volume → 200. (2) delete block that appears only on read-only volumes → 405 (?)

#15 Updated by Tim Pierce almost 6 years ago

Minor fixes added at fb460bd

Tom Clegg wrote:

Without claiming to have properly reviewed this, I noticed a couple of minor things:
  • Do we log successful DELETE requests somewhere? (I see where GET and PUT are logged, but not DELETE, index, or status... maybe those are missing too?)

Yes, good point, they are missing. I'll add logging for DELETE, but we really don't have consistent logging for success or failure for any of these. That's part of what #3229 is intended to address.

  • In TODO comment, for "delete block on read-only volume": I think there are two cases here: (1) delete block that we have two copies of, one on a read-only volume and one on a writable volume → 200. (2) delete block that appears only on read-only volumes → 405 (?)

Added.

#16 Updated by Misha Zatsman almost 6 years ago

So far I've only looked through the tests, but that looks great.

Here are some comments:

// Cases to test:

This should actually be "Cases tested:" now, right?

defer func() { KeepVM.Quit() }()
This can actually just be
defer KeepVM.Quit()
right?

uri: "http://localhost:25107/" ..
Where does this host address come from? Can we look up the port number somewhere?
Also, the host is used repeatedly, can we put it in a variable like "keep_host" or if it's actually bogus can we put something like "spoofed_keep_host" with a comment above explaining that we just need a correct-looking host but it's not actually used?

"json.NewDecoder(response.Body).Decode(&dc)"
Wow, that's so cool!

I think requesting to delete a nonexistent block should return a 404, not a 200. I can't think of a case where the data manager would ever request that a block be deleted without knowing whether the block exists. What's the rationale for making it return a 200?

"if dc.Deleted != 1 || dc.Failed != 0 {"
Can we do something like this?:

var expected_dc deletecounter
expected_dc.Deleted = 1
expected_dc.Failed = 0
if response_dc != expected_dc {
t.Errorf("superuser request to delete existing block, expected response %+v but instead received response %+v", expected_dc, response_dc)

#17 Updated by Tim Pierce almost 6 years ago

Revised code at fe5556e.

Misha Zatsman wrote:

So far I've only looked through the tests, but that looks great.

Here are some comments:

// Cases to test:

This should actually be "Cases tested:" now, right?

Yup, fixed.

defer func() { KeepVM.Quit() }()
This can actually just be
defer KeepVM.Quit()
right?

Good point! Changed throughout the code.

uri: "http://localhost:25107/" ..
Where does this host address come from? Can we look up the port number somewhere?
Also, the host is used repeatedly, can we put it in a variable like "keep_host" or if it's actually bogus can we put something like "spoofed_keep_host" with a comment above explaining that we just need a correct-looking host but it's not actually used?

In these URIs, the hostname and port are bogus. IssueRequest uses this string as the URI to build an http.Request object that gets passed to the REST router. The only important part is the URI path.

In fact, now that you've asked this question, I tested it, and it turns out you can build an http.Request with just a URI path! As long as you don't care that the request.Host field is empty. So I'm taking that out of all the tests. Much much cleaner.

"json.NewDecoder(response.Body).Decode(&dc)"
Wow, that's so cool!

It feels just a little sleazy, to be honest. :-)

I think requesting to delete a nonexistent block should return a 404, not a 200. I can't think of a case where the data manager would ever request that a block be deleted without knowing whether the block exists. What's the rationale for making it return a 200?

The code is written this way because the story specified that a successful request should return 200 and didn't make an exception for the case where copies_deleted = 0. But I agree with you and Tom did not disagree when I suggested it, so I'm changing the story. :-)

"if dc.Deleted != 1 || dc.Failed != 0 {"
Can we do something like this?:

var expected_dc deletecounter
expected_dc.Deleted = 1
expected_dc.Failed = 0
if response_dc != expected_dc {
t.Errorf("superuser request to delete existing block, expected response %+v but instead received response %+v", expected_dc, response_dc)

Yup, that's a nice idea. Done. (I have some other code on the back burner for templatizing these tests even more -- I'll run them by you when I get a chance)

#18 Updated by Tim Pierce almost 6 years ago

  • Description updated (diff)

#19 Updated by Misha Zatsman almost 6 years ago

uri: "http://localhost:25107/" ..
Where does this host address come from? Can we look up the port number somewhere?
Also, the host is used repeatedly, can we put it in a variable like "keep_host" or if it's actually bogus can we put something like "spoofed_keep_host" with a comment above explaining that we just need a correct-looking host but it's not actually used?

In these URIs, the hostname and port are bogus. IssueRequest uses this string as the URI to build an http.Request object that gets passed to the REST router. The only important part is the URI path.

In fact, now that you've asked this question, I tested it, and it turns out you can build an http.Request with just a URI path! As long as you don't care that the request.Host field is empty. So I'm taking that out of all the tests. Much much cleaner.

Sweet!

"if dc.Deleted != 1 || dc.Failed != 0 {"
Can we do something like this?:

var expected_dc deletecounter
expected_dc.Deleted = 1
expected_dc.Failed = 0
if response_dc != expected_dc {
t.Errorf("superuser request to delete existing block, expected response %+v but instead received response %+v", expected_dc, response_dc)

Yup, that's a nice idea. Done. (I have some other code on the back burner for templatizing these tests even more -- I'll run them by you when I get a chance)

Sure thing, I'd love to hear about it!

Looking at the rest of the code now.

#20 Updated by Misha Zatsman almost 6 years ago

Some more comments from the tests:

var dc, expected_dc deletecounter

Can we call dc "response_dc" or something else indicating that it summarizes the response from the handler?

response = IssueRequest(rest, superuser_nonexistent_block_req)
ExpectStatusCode(t,
    "data manager request, nonexistent block",
    http.StatusNotFound,
    response)
// Expect response {"copies_deleted":0,"copies_failed":0}
expected_dc = deletecounter{0, 0}
json.NewDecoder(response.Body).Decode(&dc)
if dc != expected_dc {
    t.Errorf("superuser_nonexistent_block_req\nexpected: %+v\nreceived: %+v",
        expected_dc, dc)
}

If we return a 404, the delete count will always be 0,0, right?

So I think we shouldn't return a count at all in that case, because it doesn't contain any new information.

// Confirm the block has been deleted
_, err := vols[0].Get(TEST_HASH)
if !os.IsNotExist(err) {
    t.Error("superuser_existing_block_req: block not deleted")
}

I found this logic really hard to follow, I tried to think of a better way to represent it, and this is the best I could come up with:

// Confirm the block has been deleted
_, err := vols[0].Get(TEST_HASH)
var block_deleted = os.IsNotExist(err)
if !block_deleted {
    t.Error("superuser_existing_block_req: block not deleted")
}

I think that's a little better in that it gets rid of the double-negative in one line, and hopefully makes it a bit easier for the reader to parse.

What do you think?

#21 Updated by Misha Zatsman almost 6 years ago

Review of handlers.go

Oblivious question:
Your handlers are defined as taking hashes, but in the keep server doc, we described the methods as taking locators. Are the locators stripped out before the requests get to the handler? Or have they just not been implemented yet?

// Upon receiving a valid request from an authorized user,
// DeleteHandler deletes all copies of the specified block on local
// volumes.

On local writeable volumes, right?

//   * If blocks were found but none could be deleted (copies_deleted
//     == 0 and copies_failed > 0), the response code is 405 Method Not
//     Allowed.

That seems like the wrong response. I'd argue this should be a 200. The caller doesn't know whether the block is located on read-only or writeable volumes, or both, so I don't think they've done anything wrong by requesting it to be deleted.

resp.Write(j)

I think we should only do this in cases that return a 200.

log.Printf("json.Marshal: %s\n", err)
log.Printf("result = %v\n", result)

I think these should be combined into one line, something like "Failed to marshall json with request %v, received error %s"

In general, if we're handling multiple requests simultaneously, the log lines from a single request will get woven in with other requests, right? If that's the case I think we should prefix all log lines with the request that we're actually handling (i.e. delete /+hash for this case) to make it clearer for the log reader.

#22 Updated by Misha Zatsman almost 6 years ago

volume.go:

func (v *MockVolume) Delete(loc string) error {
   if _, ok := v.Store[loc]; ok {
       delete(v.Store, loc)
       return nil
   }
   return os.ErrNotExist

I don't understand the purpose of the trailing "ok" here. Wouldn't this work fine?:

func (v *MockVolume) Delete(loc string) error {
   if _, ok := v.Store[loc] {
  ...

That's what's suggested at: http://blog.golang.org/go-maps-in-action

That's all, I got to the end of you change.

#23 Updated by Tim Pierce almost 6 years ago

New changes at 5219256

Misha Zatsman wrote:

Some more comments from the tests:

[...]

Can we call dc "response_dc" or something else indicating that it summarizes the response from the handler?

Yup, I'll go along with that.

If we return a 404, the delete count will always be 0,0, right?

So I think we shouldn't return a count at all in that case, because it doesn't contain any new information.

OK, I guess that makes sense. Done. I'll update the story to match.

I found this logic really hard to follow, I tried to think of a better way to represent it, and this is the best I could come up with:

[...]

I think that's a little better in that it gets rid of the double-negative in one line, and hopefully makes it a bit easier for the reader to parse.

Looks good to me :-)

Review of handlers.go

Oblivious question:
Your handlers are defined as taking hashes, but in the keep server doc, we described the methods as taking locators. Are the locators stripped out before the requests get to the handler? Or have they just not been implemented yet?

I've only tested the full locator syntax for GET and PUT, and I suspect that the handler for DELETE as it's written now will only match on a simple hash. I do want it to accept any kind of locator here, though; it should just ignore any hints that are supplied. I'll see if I can do it easily.

On local writeable volumes, right?

Yup. Updated this comment.

That seems like the wrong response. I'd argue this should be a 200. The caller doesn't know whether the block is located on read-only or writeable volumes, or both, so I don't think they've done anything wrong by requesting it to be deleted.

The client certainly hasn't done anything wrong to request it. But when the server can't satisfy the client's request for some reason, it seems appropriate to signal that appropriately in the response code. But this is probably overthinking the problem, and I really don't care that strongly about it. HTTP 200 is fine.

Updated to return HTTP 200 for cases where all found blocks could not be deleted.

I think these should be combined into one line, something like "Failed to marshall json with request %v, received error %s"

In general, if we're handling multiple requests simultaneously, the log lines from a single request will get woven in with other requests, right? If that's the case I think we should prefix all log lines with the request that we're actually handling (i.e. delete /+hash for this case) to make it clearer for the log reader.

Good point about combining them into one log statement. I have filed #3229 to improve Keep logging more generally the way you describe.

Misha Zatsman wrote:

volume.go:

[...]

I don't understand the purpose of the trailing "ok" here. Wouldn't this work fine?:

[...]

That's what's suggested at: http://blog.golang.org/go-maps-in-action

This is a Go-specific construction known as the "comma ok" idiom. The ok variable is set to true if the element exists in the map. (Also works for type assertions, possibly a couple of other things).

That page actually explains this usage and I think it recommends what I did here, but if you can point out a specific line that disagrees I'll definitely take another look. :-)

#24 Updated by Tim Pierce almost 6 years ago

  • Description updated (diff)

#25 Updated by Misha Zatsman almost 6 years ago

Tim Pierce wrote:

Misha Zatsman wrote:

volume.go:

[...]

I don't understand the purpose of the trailing "ok" here. Wouldn't this work fine?:

[...]

That's what's suggested at: http://blog.golang.org/go-maps-in-action

This is a Go-specific construction known as the "comma ok" idiom. The ok variable is set to true if the element exists in the map. (Also works for type assertions, possibly a couple of other things).

That page actually explains this usage and I think it recommends what I did here, but if you can point out a specific line that disagrees I'll definitely take another look. :-)

Yes, I believe I understand the comma ok idiom, and that's why I used it in my suggested code. The difference between what you wrote and what I suggested does not involve the comma at all, but the semicolon ok that you ended the conditional with. There is no suggestion of using a semicolon or a second ok in that page.

#26 Updated by Misha Zatsman almost 6 years ago

Misha Zatsman wrote:

Tim Pierce wrote:

Misha Zatsman wrote:

volume.go:

[...]

I don't understand the purpose of the trailing "ok" here. Wouldn't this work fine?:

[...]

That's what's suggested at: http://blog.golang.org/go-maps-in-action

This is a Go-specific construction known as the "comma ok" idiom. The ok variable is set to true if the element exists in the map. (Also works for type assertions, possibly a couple of other things).

That page actually explains this usage and I think it recommends what I did here, but if you can point out a specific line that disagrees I'll definitely take another look. :-)

Yes, I believe I understand the comma ok idiom, and that's why I used it in my suggested code. The difference between what you wrote and what I suggested does not involve the comma at all, but the semicolon ok that you ended the conditional with. There is no suggestion of using a semicolon or a second ok in that page.

Ugh, sorry about that, now I get what the semicolon ok is for. I was only focusing on the setting of ok, but you need the semicolon and the second ok to feed to the if statement.

#27 Updated by Tim Pierce almost 6 years ago

Misha Zatsman wrote:

Ugh, sorry about that, now I get what the semicolon ok is for. I was only focusing on the setting of ok, but you need the semicolon and the second ok to feed to the if statement.

Yeah, it's a little confusing when you're coming from languages like C/C++/Perl where an assignment statement is also an expression, and can be implicitly cast to a boolean. I had to double check myself on that the first few times too :-)

#28 Updated by Tim Pierce almost 6 years ago

  • Description updated (diff)
  • Respond 403 Forbidden if the token is valid (users.get responds 200) but does not pass is_admin/scopes verification, or users.get responds 403 (which is another scope problem).
  • Respond 401 Unauthorized If the token is invalid (users.get responds 401) (This probably means the token has expired.)

This is dicey because we don't currently get this level of detail from the sdk.ArvadosClient back end. The client just returns an ArvadosErrorBadRequest for any 4xx error.

We don't have enough time left in the sprint to update the SDK and all of the code that uses it (keepclient, keepproxy) so I'm pushing this part of the story to #3483.

#29 Updated by Anonymous almost 6 years ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados|commit:6d0b4ad4696fa11a3e940cb731f134a61fc26729.

Also available in: Atom PDF