Feature #8554

[Keep] Implement trash/untrash behavior in volume_unix

Added by Peter Amstutz over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Keep
Target version:
Start date:
03/11/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

Implementation

Trash:
  • If trash-ttl > 0:
    1. set deadline = now + trash-ttl
    2. Rename abc/{hash} to abc/{hash}.trash.{deadline}
  • If trash-ttl == 0:
    1. Just delete abc/{hash} now (i.e., same behavior we have now)
Untrash:
  1. Rename the first file named abc/{hash}.trash.* (there might be more than one) to abc/{hash}
EmptyTrash:
  1. Walk hierarchy looking for {hash}.trash.*
  2. If deadline < now, delete the file
  3. Log amount of deleted stuff: number of blocks, number of bytes
  4. Log amount of undeleted stuff still in trash: number of blocks, number of bytes
Empty-trash goroutine (this part is not specific to unix volumes):
  1. Use one time.Ticker with configurable trash-check interval (default 1 day?)
  2. When ticker fires, empty trash on all volumes

Subtasks

Task #8612: Review branch 8554-trash-untrash-unix-volumeResolvedRadhika Chippada


Related issues

Related to Arvados - Story #8178: [Keepstore] Implement interfaces trash area and undelete endpointResolved01/11/2016

Associated revisions

Revision 5ba59abf
Added by Radhika Chippada over 5 years ago

closes #8554
Merge branch '8554-trash-untrash-unix-volume'

History

#1 Updated by Brett Smith over 5 years ago

  • Target version set to Arvados Future Sprints

#2 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#3 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#4 Updated by Radhika Chippada over 5 years ago

  • Category set to Keep
  • Assigned To set to Radhika Chippada
  • Target version changed from Arvados Future Sprints to 2016-03-16 sprint

#5 Updated by Radhika Chippada over 5 years ago

  • Status changed from New to In Progress

#6 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#7 Updated by Tom Clegg over 5 years ago

(from irc) Duration flags should be DurationVar not IntVar

The doneEmptyingTrash line here should move down two lines, so it stops the trash collector after a signal is received, rather than during program startup...

        go func(sig <-chan os.Signal) {
+               doneEmptyingTrash <- true
                s := <-sig
                log.Println("caught signal:", s)
                listener.Close()

It would be better (and easier to test reliably) to pass the doneEmptyingTrash channel to the emptyTrash() func explicitly rather than using a global.

I think this should break if err == nil, not if err != nil. (With the current code, we give up as soon as we fail to untrash a block. Shouldn't we stop as soon as we succeed in untrashing, and return an error only if none of the trash we found was untrashable?)

       for _, f := range files {
               if strings.HasPrefix(f.Name(), prefix) {
                       err = os.Rename(v.blockPath(f.Name()), v.blockPath(loc))
                       if err != nil {
                               break
                       }
               }
       }

Also in Untrash, if no trashed block was found, we should return a non-nil error that satisfies os.IsNotExist().

This regexp should be more careful to only match files that trash could conceivably be used by the trash function. Suggest:

-var trashRegexp = regexp.MustCompile(`.*([0-9a-fA-F]{32}).trash.(\d+)`)
+var trashRegexp = regexp.MustCompile(`/([0-9a-f]{32}).trash.(\d+)$`)

Style: Please un-pyramid the code in (v *UnixVolume) EmptyTrash; e.g., return nil after log.Printf, instead of putting the rest of the func in an else.

               if err != nil {
                       log.Printf("EmptyTrash error for %v: %v", path, err)
               } else if !info.Mode().IsDir() {
                       matches := trashRegexp.FindStringSubmatch(path)
                       if len(matches) == 3 {
                               ...

Also in (v *UnixVolume) EmptyTrash -- either it should return a non-nil error if an error is encountered during filepath.Walk() (either passed in to the walk function, or encountered in the walk function), or it should not have a function signature that suggests it is capable of returning errors. The latter is probably more reasonable: the only thing the caller would do differently is log the error, and we already log errors right from EmptyTrash.

(v *UnixVolume) EmptyTrash should print stats even if it encountered an error (i.e., the stats shouldn't be in an "else")

The volume tests shouldn't test the emptyTrash goroutine. Instead they should just call EmptyTrash on the volume being tested. This way it won't be necessary to rely on sleep(), and we can reduce the three nearly-identical test functions to one sequence that checks various behaviors:
  • put block
  • set trashLifetime=time.Hour
  • trash block
  • if err == ErrNotImplemented { return } (rest of the tests don't apply)
  • get block → os.IsNotExist(err)
  • empty trash
  • untrash block
  • get block → err == nil
  • untrash block → os.IsNotExist(err)
  • get block → err == nil
  • set trashLifetime=time.Nanosecond
  • trash block
  • untrash block → err == nil (confirm the data doesn't go away until we call EmptyTrash)
  • trash block
  • empty trash
  • untrash block → os.IsNotExist(err)
  • get block → os.IsNotExist(err)
  • put block
  • set trashLifetime=time.Nanosecond
  • trash block
  • put same block again → err == nil
  • empty trash
  • get block → err == nil (confirm we don't delete an un-trashed block by mistake even if the same block is also in trash)
  • trash block
  • set trashLifetime=time.Hour
  • trash block → err == nil
  • empty trash
  • untrash block → err == nil (confirm we don't delete the unexpired copy, even if the same block is in both expired and unexpired trash)

I think if we change the empty-trash condition from "deadline < now" to "deadline <= now" then a trash lifetime of 1ns should assure us that whatever we put in the trash is eligible for deletion right away (even if we empty trash in the same nanosecond, the unix timestamp only has 1-second precision so a trash lifetime under 1s would effectively mean "eligible for emptying right now").

The existing "delete" tests already test for correct behavior when trashLifetime==0 so we don't need to add more tests for that, right?

Nit: Instead of parsing as int with Atoi and then casting to int64, we could just use strconv.ParseInt here:

-deadline, err := strconv.Atoi(matches[2])
-...
-if int64(deadline) < ...
+deadline, err := strconv.ParseInt(matches[2], 10, 64)
+...
+if deadline < ...

#8 Updated by Radhika Chippada over 5 years ago

(from irc) Duration flags should be DurationVar not IntVar

Done

The doneEmptyingTrash line here should move down two lines, so it stops the trash collector after a signal is received, rather than during program startup...

Done

It would be better (and easier to test reliably) to pass the doneEmptyingTrash channel to the emptyTrash() func explicitly rather than using a global.

Done

I think this should break if err == nil, not if err != nil.

Good catch. Fixed

and return an error only if none of the trash we found was untrashable?

Done

Also in Untrash, if no trashed block was found, we should return a non-nil error that satisfies os.IsNotExist().

Done

This regexp should be more careful to only match files that trash could conceivably be used by the trash function. Suggest:

Updated

Style: Please un-pyramid the code

Done

Also in (v *UnixVolume) EmptyTrash -- either it should return a non-nil error if an error is encountered ...

Done (changed to not return error)

(v *UnixVolume) EmptyTrash should print stats even if it encountered an error (i.e., the stats shouldn't be in an "else")

Done

Test sequence listed ...

Added test with the listed sequence. (There is one extra trash listed in this sequence: the fifth from the bottom before changing trashLifetime to 1 hour. I omitted this)

The volume tests shouldn't test the emptyTrash goroutine. Instead they should just call EmptyTrash on the volume being tested.

I am uncomfortable with the idea that we do not test the emptyTrash goroutine at all. Instead of removing this (after adding your suggested test sequence), I improved the test with the goroutine to use 1ns for trashLifetime so that it does not waste time during the test. But if you'd rather remove it, I will remove it.

The existing "delete" tests already test for correct behavior when trashLifetime==0 so we don't need to add more tests for that, right?

Deleted it

Nit: Instead of parsing as int with Atoi and then casting to int64, we could just use strconv.ParseInt here:

Done

#9 Updated by Tom Clegg over 5 years ago

I am uncomfortable with the idea that we do not test the emptyTrash goroutine at all. Instead of removing this (after adding your suggested test sequence), I improved the test with the goroutine to use 1ns for trashLifetime so that it does not waste time during the test. But if you'd rather remove it, I will remove it.

The idea of testing emptyTrash is nice, but having it fire every 1ns while we're running our volume tests doesn't seem like a very effective way to test it. We still have to call EmptyTrash ourselves in order to avoid races, so the goroutine mainly has the effect of making the test flaky:

--- FAIL: TestUnixVolumeWithGenericTests (0.32s)
        volume_generic_test.go:1007: file does not exist

We could test the emptyTrash goroutine separately by setting up some UnixVolumes, calling Put and Trash, and waiting in a busy loop for the trashed data to disappear. This doesn't belong in the generic volume test suite, though -- we don't need to test it for each volume type/config.

#10 Updated by Radhika Chippada over 5 years ago

Deleted testTrashUntrashWithEmptyTrashGoroutine

#11 Updated by Radhika Chippada over 5 years ago

  • Story points set to 2.0

#12 Updated by Tom Clegg over 5 years ago

Radhika Chippada wrote:

Added test with the listed sequence. (There is one extra trash listed in this sequence: the fifth from the bottom before changing trashLifetime to 1 hour. I omitted this)

I've added that in, with comments. The idea is to put the same data in the trash twice, with different deadlines, and then empty the trash after the first deadline has passed but before the second one arrives. In this case it should still be possible to untrash: i.e., we don't throw out the "new" trash just because we also had identical "old" trash.

Also escaped the "."s in the regexp and moved it next to the function that uses it.

9c571f5 look ok?

#13 Updated by Tom Clegg over 5 years ago

...and the rest of the un-pyramid thing → 7309fab

#14 Updated by Radhika Chippada over 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:5ba59abf2fece90309815b15b2d118860fb3b1b3.

Also available in: Atom PDF