Project

General

Profile

Actions

Idea #8178

closed

[Keepstore] Implement interfaces trash area and undelete endpoint

Added by Brett Smith over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Story points:
0.0

Description

Add a configuration knob to keepstore that means "move blocks on the trash list to a trash area, then actually delete them after they stay there for time interval N."
  • -trash-lifetime 240h

Rename the delete method of storage volumes to trash.

Add an untrash method to the storage volume interface.

Add an undelete endpoint for Keepstore that's authenticated by the data manager token, and moves a specified block locator out of trash back to main storage using the new untrash method.

If an admin sets a nonzero trash timer using the new configuration knob, and the underlying storage volume does not support the new untrash method, keepstore should refuse to start.

This story does not include implementations for any concrete storage volume.


Subtasks 3 (0 open3 closed)

Task #8272: Review branch: 8178-keepstore-trash-interfaceResolvedRadhika Chippada01/11/2016Actions
Task #8310: Review branch: 8178-trash-interface-generic-volume-testResolvedTom Clegg01/27/2016Actions
Task #8353: Fix AdRoll/goamz stubResolvedTom Clegg01/11/2016Actions

Related issues

Related to Arvados - Feature #8554: [Keep] Implement trash/untrash behavior in volume_unixResolvedRadhika Chippada03/11/2016Actions
Related to Arvados - Feature #8556: [Keep] Implement trash/untrash behavior in azure_blob_volumeResolvedRadhika Chippada05/05/2016Actions
Related to Arvados - Feature #8555: [Keep] Implement trash/untrash behavior in s3_volumeResolvedTom Clegg06/16/2016Actions
Actions #1

Updated by Brett Smith over 8 years ago

  • Target version set to Arvados Future Sprints
Actions #2

Updated by Brett Smith over 8 years ago

  • Description updated (diff)
  • Subject changed from [Keepstore] Blocks on the trash list are moved to trash, and actually deleted later, after another timer to [Keepstore] Implement interfaces trash area and undelete endpoint
Actions #3

Updated by Brett Smith over 8 years ago

  • Description updated (diff)
Actions #4

Updated by Brett Smith over 8 years ago

  • Story points set to 2.0
Actions #5

Updated by Tom Clegg over 8 years ago

  • Description updated (diff)
Actions #6

Updated by Radhika Chippada about 8 years ago

  • Status changed from New to In Progress
  • Assigned To set to Radhika Chippada
Actions #7

Updated by Brett Smith about 8 years ago

  • Target version changed from Arvados Future Sprints to 2016-02-03 Sprint
Actions #8

Updated by Radhika Chippada about 8 years ago

a4d0d55ac333e8060d13a600d8aa8f5443760b29

  • Renamed Delete as Trash and updated comment in volume.go
  • Added Untrash (with empty implementation for now)
  • Added undelete endpoint (UndeleteHandler) and added tests
  • Added -trash-lifetime in keepstore.go
    • s3 volume adder and azure blob volume adder are update to return error when trash-lifetime <= 0
    • Returning error in volume_unix is causing test failures. I tried to send it in using run_test_servers (similar to never-delete), but I am still seeing errors with this. For the time being, I commented out the return error statement. I will continue researching into this; please let me know if you see where else I should be sending this in when keepstore servers are being started.
Actions #9

Updated by Tom Clegg about 8 years ago

8178-keepstore-trash-interface @ a4d0d55

-trash-lifetime should be a time.Duration -- this way flag.DurationVar will take care of parsing "240h"

Usage message doesn't need fmt.Sprintf(), it's just a string. Suggest: "Interval after a block is trashed during which it can be recovered using an /untrash request"

Sorry I didn't think/say this earlier, but I think we should call the http route "untrash" rather than "undelete". Currently we accept an HTTP "delete" method, but everywhere else we refer to "trash", and I think it will be less confusing if we reserve the word "delete" for the irreversible thing that happens to blocks that have been in the trash long enough.

UndeleteHandler shouldn't break after succeeding on one volume: if the block is in the trash on several volumes, we should untrash all of them.

If there are no writable volumes, we should respond http.StatusNotFound with "no writable volumes" in the body.

If os.IsNotExist(err) returns true for all errors, we should respond http.StatusNotFound

(We should return 500 only if we get an error other than "doesn't exist"... and I think it would be good form to say http.StatusInternalServerError not 500)

Calling resp.Write() before resp.WriteHeader() causes an implicit resp.WriteHeader(http.StatusOK) -- we should either reverse the order of these calls, or return after writing the body in the 200 case.

The stubbed Untrash methods should return ErrNotImplemented -- not nil, which indicates the block was untrashed.

The Trash() methods should return ErrNotImplemented if trashLifetime != 0.

The volumeAddr Set() methods should return ErrNotImplemented if trashLifetime != 0 (not if trashLifetime <= 0). trashLifetime == 0 is the only supported case right now. The returned/reported error should be something like "trashLifetime > 0 is not supported by this volume type".

Actions #10

Updated by Radhika Chippada about 8 years ago

Addressed all those review comments at 3b31f110db82c93c3aade294f50bbb0218c74697

Actions #11

Updated by Tom Clegg about 8 years ago

The real volumes' Trash() methods should return ErrNotImplemented if trashLifetime != 0.

This doesn't seem right. ErrNotImplemented means the block was not untrashed, and should be reported as an error:

               if err == nil || err == ErrNotImplemented {
                       log.Printf("Untrashed %v on volume %v", hash, vol.String())

(It looks like the tests should pass without this oddity, since the handler gets tested with MockVolumes, which return nil without actually untrashing anything -- right?)

Style nit: use "else if" or a switch, and follow "handle errors first" Go idiom, so instead of
  •                if err == nil {
                           ...
                   } else {
                           if os.IsNotExist(err) {
                                   ...
                           } else {
                                   ...
                           }
                   }
    

    ...something like
  •                if os.IsNotExist(err) {
                           ...
                   } else if err != nil {
                           ...
                   } else {
                           ...
                   }
    

It would be valuable to add a generic volume test for the "Trash with non-zero trash time, then Untrash" sequence; for now it would be acceptable to either return ErrNotImplemented from Trash() (in which case Untrash doesn't get tested) or return nil, and then successfully Untrash() the block so Get() works again. I suggest adding this test in its own branch -- IMO it shouldn't prevent merging 8178-keepstore-trash-interface, but we could do it before closing this issue.

Actions #12

Updated by Radhika Chippada about 8 years ago

The real volumes' Trash() methods should return ErrNotImplemented if trashLifetime != 0.

Done

This doesn't seem right. ErrNotImplemented means the block was not untrashed, and should be reported as an error:

Done. This was a bit unclear from previous comment, but now addressed.

(It looks like the tests should pass without this oddity, since the handler gets tested with MockVolumes, which return nil without actually untrashing anything -- right?)

Updated MockVolume to return nil, instead of ErrNotImplemented

Style nit

Done

Adding test in generic volume test

Yes, I also thought of this. I will add it in a separate branch as you suggested (after merging this in master). Thanks.

Actions #13

Updated by Tom Clegg about 8 years ago

LGTM @ daa3cc4, thanks

Actions #14

Updated by Radhika Chippada about 8 years ago

Branch 8178-trash-interface-generic-volume-test adds generic volume tests for the new trash / untrash interface at 5a7b7ea65edaf40f5d9c17adc2184b0a96f05a2d

Actions #15

Updated by Tom Clegg about 8 years ago

Tom Clegg wrote:

It would be valuable to add a generic volume test for the "Trash with non-zero trash time, then Untrash" sequence; for now it would be acceptable to either return ErrNotImplemented from Trash() (in which case Untrash doesn't get tested) or return nil, and then successfully Untrash() the block so Get() works again. I suggest adding this test in its own branch -- IMO it shouldn't prevent merging 8178-keepstore-trash-interface, but we could do it before closing this issue.

More specifics:

testTrashUntrash
  • Set trashLifetime=3600 for duration of test case
  • PutRaw a block, and backdate it
  • Trash the block
  • If the volume is not writable:
    • Trash must return ErrMethodDisabled
  • Else if Trash returned a non-nil err:
    • err must be ErrNotImplemented
  • Else (i.e., Trash returned err==nil):
    • Get the block → must fail, os.IsNotExist() must return true for the returned error
    • Untrash the block → must succeed
  • Then, regardless of how Trash returned:
    • Get the block → must succeed

When the implementations catch up, we will disallow the ErrNotImplemented possibility.

I don't think we should duplicate the existing tests for "trash block that's newer than blobSignatureTTL", etc. (If anything, we could add a loop in the existing test (testDeleteNewBlock) so it checks once with trashLifetime=0 and once with trashLifetime=time.Hour. But testTrashUntrashNewBlock at 5a7b7ea for example doesn't seem to test anything we aren't already testing -- except ensuring that Untrash() is not implemented, which is only one of multiple acceptable possibilities that should be tested in combination with Trash() as above.)

Actions #16

Updated by Radhika Chippada about 8 years ago

Updated the trash/untrash test with the sequence of steps listed above.

I noticed one issue while writing the test. The s3 test setup seems to empty the block when backdated. Due to this Get after a trash / untrash sequence does not pass comparison as expected with the TestBlock. I tweaked the test to pass in this case as well. But, we might want to fix the issue with the s3 test setup for the long term.

Actions #17

Updated by Peter Amstutz about 8 years ago

  • Target version changed from 2016-02-03 Sprint to 2016-02-17 Sprint
Actions #18

Updated by Peter Amstutz about 8 years ago

  • Story points deleted (2.0)
Actions #19

Updated by Radhika Chippada about 8 years ago

  • Story points set to 0.0
Actions #20

Updated by Tom Clegg about 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:3920b0d627ac89687a741b186554c1afd61750f5.

Actions

Also available in: Atom PDF