Feature #8556

[Keep] Implement trash/untrash behavior in azure_blob_volume

Added by Peter Amstutz over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
Keep
Target version:
Start date:
05/05/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

Implementation

  1. Trash → set <ExpiresAt>123</ExpiresAt> and create a marker (just an empty blob) named trash/000000000123/hash
  2. Untrash → unset ExpiresAt and delete trash/000000000123/hash
  3. EmptyTrash → index trash/* in order. For each trash marker with deadline <= now:
    • Look up the modtime and ExpiresAt for the data block itself.
    • If the data is still old and ExpiresAt is still set, delete the data block (otherwise do nothing).
    • Delete the trash marker (regardless of whether the data block was deleted).
  4. Get, Mtime, Touch, IndexTo → check ExpiresAt metadata. If present, return/skip as if the block was not found at all.

Tests / Acceptance Criteria

Ensure {Put; Trash; Put} results in ExpiresAt being cleared. This might happen by itself, or we might need to use SetBlobMetadata to clear the metadata explicitly after writing each block. (The test for this should already be in place, in the generic volume test: {Put; Backdate; Trash; Put; Backdate; empty trash}.)

Ensure the Azure server stub behaves the same as the real Azure server wrt any feature we're relying on for this. E.g., it's not enough to know that our stub clears the metadata when overwriting a block. We need to know that the real Azure service is documented to behave that way, and actually does behave that way (e.g., via manual testing with real Azure containers).

All existing tests in volume_generic_test must pass, even when "Trash with trashLifetime>0" is implemented.

In the first set of testTrashEmptyTrashUntrash, with trashLifetime>0, after the "Get() returns notfound", check:
  • Mtime → notfound
  • Compare → notfound
  • IndexTo does not include the trashed block
  • Touch → notfound
In the first set of testTrashEmptyTrashUntrash, with trashLifetime>0, after the "Get() returns data after Untrash()", check:
  • Mtime → ok
  • Compare → ok
  • IndexTo includes the block
  • Touch → ok
In testDeleteOldBlock(), after checking v.Get(TestHash), check:
  • Mtime → notfound
  • Compare → notfound
  • IndexTo does not include the trashed block
  • Touch → notfound

Future work (defer)

When doing a full index (prefix=""), build two trash lists: a list of blocks whose metadata indicates that they are trash, and a list of trash markers. After sending the index response and closing the connection, delete any trash markers whose block metadata indicates that they are not trash.


Subtasks

Task #8740: Review branch 8556-trash-untrash-azure-volumeResolvedRadhika Chippada


Related issues

Related to Arvados - Story #8178: [Keepstore] Implement interfaces trash area and undelete endpointResolved01/11/2016

Blocked by Arvados - Story #8833: [Keep] Azure SDK update to add metadata to ListBlobs callResolved04/28/2016

Associated revisions

Revision abd7a595 (diff)
Added by Tom Clegg over 5 years ago

8556: Update method signature for current SDK version.

refs #8556

Revision aa1d6f9c
Added by Radhika Chippada over 5 years ago

closes #8556
Merge branch '8556-azure-trash'

History

#1 Updated by Peter Amstutz over 5 years ago

  • Tracker changed from Bug to Feature

#2 Updated by Brett Smith over 5 years ago

  • Target version set to Arvados Future Sprints

#3 Updated by Peter Amstutz over 5 years ago

Relevant APIs

"Lease Container" operation establishes and manages a lock on a container for delete operations.

"List Blobs" operation enumerates the list of blobs under the specified container. Can query by prefix. Blob metadata is returned by the list blobs operation with the "include=metadata" parameter.

"Delete Blob" operation marks the specified blob or snapshot for deletion. When a blob is successfully deleted, it is immediately removed from the storage account's index and is no longer accessible to clients. The blob's data is later removed from the service during garbage collection. It is an asynchronous operation (returns HTTP 202 Accepted)

"Copy Blob" operation copies a blob to a destination within the storage account. It is an asynchronous operation (returns HTTP 202 Accepted).

"Get Blob" reads or downloads a blob from the system. Blob metadata is returned in the headers.

#4 Updated by Peter Amstutz over 5 years ago

The biggest limiting factor is that "List Blobs" has extremely limited options for filtering (pretty much just by prefix). Filtering by metadata, for example, isn't supported at all.

Using "Copy Blob" incurs additional cost:

When you copy a source blob to a destination blob with a different name within the same account, you use additional storage resources for the new blob, so the copy operation results in a charge against the storage account’s capacity usage for those additional resources. However, if the source and destination blob name are the same within the same account (for example, when you promote a snapshot to its base blob), no additional charge is incurred other than the extra copy metadata stored in version 2012-02-12 and newer.

When you promote a snapshot to replace its base blob, the snapshot and base blob become identical. They share blocks or pages, so the copy operation does not result in an additional charge against the storage account's capacity usage. However, if you copy a snapshot to a destination blob with a different name, an additional charge is incurred for the storage resources used by the new blob that results. Two blobs with different names cannot share blocks or pages even if they are identical. For more information about snapshot cost scenarios, see Understanding How Snapshots Accrue Charges.

It's not clear what happens if you initiate a Copy Blob operation and then perform Delete Blob before the copy is complete (according to Stack Overflow, this is bad news: https://stackoverflow.com/questions/3734672/azure-storage-blob-rename)

We can update/get metadata on a easily on a per-block basis, just not filter Blob List on it.

#5 Updated by Peter Amstutz over 5 years ago

Tentative trash design:

  1. Initiate copy blob XYZ to "trash/XYZ" and set metadata <ExpiresAt>123</ExpiresAt> on the trash target.
  2. Conditionally (not-modified-since) update metadata on the source blob to <ExpiresAt>123</ExpiresAt>
  3. List blobs with prefix "trash/"
  4. Conditionally (not-modified-since) try to delete the active blob that the trash blob was copied from
    1. Unless the trashed blob has metadata <SourceDeleted>true</SourceDeleted> so we don't keep doing this all the time.
    2. Set metadata <SourceDeleted>true</SourceDeleted> on the trash blob after deleting the source
  5. Actually delete blobs in "trash/" for which <ExpiresAt>123</ExpiresAt> has passed.
  6. When fetching a blob, treat it as a "not found" if ExpiresAt is metadata is present.
  7. Must ensure that any PUTs to active blob store also reset metadata on the blob.

Rationale: the reason for this shuffle is to avoid having to enumerate the full list of active blobs, which could be 100s of thousands of objects. Using the List Blobs API, we can fetch up to 5000 results per API request, it requires using a cursor to request subsequent results. For a million blobs, this means at least 200 API requests, which is likely to take awhile.

Simple design, but requires enumerating the full blob list:

  1. Conditionally (not-modified-since) update metadata on the source blob to <ExpiresAt>123</ExpiresAt>
  2. List all blobs
  3. Conditionally (not-modified-since) delete blobs in which <ExpiresAt>123</ExpiresAt> has passed.
  4. When fetching a blob, treat it as a "not found" if ExpiresAt is metadata is present.
  5. Must ensure that any PUTs to active blob store also reset metadata on the blob.

#6 Updated by Tom Clegg over 5 years ago

Could we...
  1. Trash → set <ExpiresAt>123</ExpiresAt> and create a marker (just an empty blob) named trash/0000123/hash
  2. Untrash → unset ExpiresAt and delete trash/0000123/hash
  3. EmptyTrash → index trash/* in order until the timestamps start being in the future. For each trash marker, get the modtime and ExpiresAt for the hash; if the data is still old and ExpiresAt is still set, delete it (otherwise do nothing); delete the trash marker in any case.

Doing a full index shouldn't be toooo scary, since we already need to do that on every data manager run...

#7 Updated by Peter Amstutz over 5 years ago

Tom Clegg wrote:

Could we...
  1. Trash → set <ExpiresAt>123</ExpiresAt> and create a marker (just an empty blob) named trash/0000123/hash
  2. Untrash → unset ExpiresAt and delete trash/0000123/hash
  3. EmptyTrash → index trash/* in order until the timestamps start being in the future. For each trash marker, get the modtime and ExpiresAt for the hash; if the data is still old and ExpiresAt is still set, delete it (otherwise do nothing); delete the trash marker in any case.

Doing a full index shouldn't be toooo scary, since we already need to do that on every data manager run...

That's a good point. In that case, the simple solution (use metadata) might be the way to go. We could make note of deleteable blocks as part of index scanning.

#8 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#9 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#10 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#11 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)
  • Category set to Keep

#12 Updated by Brett Smith over 5 years ago

  • Target version changed from Arvados Future Sprints to 2016-03-30 sprint

#13 Updated by Brett Smith over 5 years ago

  • Story points set to 3.0

#14 Updated by Radhika Chippada over 5 years ago

  • Assigned To set to Radhika Chippada

#15 Updated by Radhika Chippada over 5 years ago

  • Description updated (diff)

#16 Updated by Radhika Chippada over 5 years ago

  • Status changed from New to In Progress

#17 Updated by Tom Clegg over 5 years ago

  • Description updated (diff)

#18 Updated by Tom Clegg over 5 years ago

Turns out retrieving blob metadata with list responses is not yet supported in the Azure Go SDK. Without this, it's not feasible to omit trashed blocks from index responses. Possibilities:
  • Include trash in index responses. Datamanager will keep seeing the trash, and repeatedly tell keepstore to trash it; keepstore will keep trashing it, and as a side effect the block's last-modified date will be updated, which will prevent the blocks from getting old enough to delete -- unless we stop storing the "expires at" metadata (perhaps "trash" should avoid updating "expires at" anyway when it's already set?). Datamanager will also produce misleading statistics: if a lot of data is mistakenly sent to the trash, subsequent Datamanager runs will report reassuringly large amounts of live/untrashed data, which detracts somewhat from the benefit of having the trash phase. Keep-rsync will try to copy blocks that are in the trash, and error out (or we could fix it so it logs errors and continues).
  • Store the trash flag in properties instead of metadata? Properties aren't user-defined: nowhere to put it.
  • Get the blob metadata for each item in the list using SDKs that do exist? Too slow: each "index" would have to do a separate query for every single block in the system.
  • Before generating index, get a list of trash markers, and omit those blocks from the index response? To avoid omitting blocks that have been written since being trashed, we would have to omit only blocks with last-trashed >= last-modified. So far our plan does not involve storing last-trashed, but we could add this: trash markers would be something like trash/000000000123/hash/000000000099 where 123 is the empty-trash deadline and 99 is the time the trash marker was created. This naming scheme would become redundant as soon as the SDK gets "list blobs with metadata" support.
  • Add "list blobs with metadata" support to the SDK and submit a PR.

#19 Updated by Radhika Chippada over 5 years ago

Regarding adding "list blobs with metadata" support to the SDK and submit a PR:

  • It appears (as Peter pointed out in IRC), we can use "include={metadata}" to ListBlobs call to get the metadata headers.
  • Then add these to the Blob struct by adding Metadata map[string]string `xml:"Metadata"`
  • Add a test or two

Estimate: 1 story point

#20 Updated by Radhika Chippada over 5 years ago

When SetBlobMetadata to set ExpiresAt metadata attribute during Trash, the Mtime on the blob will be updated as well.

"Any write operation on the blob (including updates on the blob's metadata or properties) changes the last modified time of the blob." https://msdn.microsoft.com/en-us/library/azure/dd179414.aspx

Due to this, we will have failures if we try to invoke Trash on the same block more than once. The second call would find this block to be newer than blobSignatureTTL.

Ideally, we would need to set additional metadata attributes such as lastWrite and use this instead of Mtime to determine when a block should be considered trash. This would need updates to the Put func to set metadata after CreateBlockBlobFromReader.

#21 Updated by Brett Smith over 5 years ago

  • Status changed from In Progress to New
  • Assigned To deleted (Radhika Chippada)
  • Target version changed from 2016-03-30 sprint to Arvados Future Sprints

#22 Updated by Tom Clegg over 5 years ago

  • Target version changed from Arvados Future Sprints to 2016-05-11 sprint

#23 Updated by Tom Clegg over 5 years ago

  • Story points changed from 3.0 to 2.0

#24 Updated by Radhika Chippada over 5 years ago

  • Assigned To set to Radhika Chippada

#25 Updated by Radhika Chippada over 5 years ago

  • Status changed from New to In Progress

#26 Updated by Radhika Chippada over 5 years ago

b9d5f644

  • Trash/untrash are implemented in azure volume using the latest azure sdk
  • As explained in note 20, a new last_write_at metadata attribute is used to determine when a blob is actually updated / touched.

#27 Updated by Tom Clegg over 5 years ago

I'm not convinced this last_write_at strategy is safe. The fundamental issue seems to be the lack of an atomic "update metadata" operation. Consider the following race:

"empty trash" worker             "put" request handler

-----------------------------    -----------------------------
                                 client requests PUT abc123

                                 Touch(abc123) -> addToMetadata()
notice abc123 is in the trash
                                 get blob metadata
get blob metadata
                                 remove expires_at from local copy of metadata
see expires_at as expected
                                 add last_write_at=now to local copy of metadata
see last_write_at is old
                                 write new metadata with last_write_at=now
delete blob
                                 success! data for abc123 is safe
success! data for abc123 is gone

Is there anything preventing this? If not, any thoughts on how to fix it?

Meanwhile, a couple of notes on the code:

checkTrashed returns an error to indicate "yes, it's in the trash", which seems a bit weird. How about returning (bool, error) where true means it's in the trash, false means it's not in the trash, and a non-nil error means we don't know because we encountered an unexpected error? (The current code doesn't really care about the difference between "error" and "is in trash", but "is in trash" is generally an indication that something dangerous has happened -- if a client has a valid permission tokens for a block, it shouldn't be in the trash! -- so we should be logging/reporting this.)

v.translateError(NotFoundError) is redundant, since that just returns NotFoundError. But in any case, Get() is required to return something satisfying os.IsNotExist() (like os.ErrNotExist) if the block doesn't exist. IOW please fix the code so the IsNotExist tests pass, instead of weakening the tests with a strings.Contains() exemption.

The new checks in testTrashEmptyTrashUntrash seem like they could be in the checkGet() helper: if Get() fails, then Mtime and Compare must also fail with an IsNotExist() error; if Get() succeeds, then Mtime and Compare must also succeed.

In the Azure Trash() function we seem to have changed the sequence from "Etag, Mtime" to "Mtime, Etag", but left the comment in place that explains why the sequence is "Etag, Mtime". No fair! I think we need to change this back.

In EmptyTrash, the "noMoreOldMarkers" flag could be replaced by putting a label on the outer "for" loop:

blobListPage: for {
    ...
    for _, b := range resp.Blobs {
        ...
        if ... {
            break blobListPage
        }
    }
}

#28 Updated by Radhika Chippada over 5 years ago

I'm not convinced this last_write_at strategy is safe ... && In the Azure Trash() function we seem to have changed the sequence from "Etag, Mtime" to "Mtime, Etag" ...

I introduced this problem by switching the Etag order. Corrected this

checkTrashed returns an error to indicate "yes, it's in the trash", which seems a bit weird. How about returning (bool, error) where true means it's in the trash, false means it's not in the trash, and a non-nil error means we don't know because we encountered an unexpected error?

This is much better and I updated accordingly

v.translateError(NotFoundError) is redundant, since that just returns NotFoundError. But in any case, Get() is required to return something satisfying os.IsNotExist() (like os.ErrNotExist) if the block doesn't exist. IOW please fix the code so the IsNotExist tests pass, instead of weakening the tests with a strings.Contains() exemption.

Made this update

The new checks in testTrashEmptyTrashUntrash seem like they could be in the checkGet() helper: if Get() fails, then Mtime and Compare must also fail with an IsNotExist() error; if Get() succeeds, then Mtime and Compare must also succeed.

Moved Mtime and Compare into the Get() helper.

In EmptyTrash, the "noMoreOldMarkers" flag could be replaced by putting a label on the outer "for" loop:

Added the label

#29 Updated by Radhika Chippada over 5 years ago

Commit d5e4ac8e6

Check Etag while deleting a block during EmptyTrash routine.

#30 Updated by Tom Clegg over 5 years ago

Mtime needs to return a correct result for a blob that was written to Azure storage by an earlier version of keepstore (i.e., a blob with no last_write_at metadata). Its comment should also be updated to match its new behavior.

We have multiple sources of truth here, and I think it would help to write down the rules about the significance of each state and how it's possible to arrive at each one.

last_write_at expires_at trash marker result how can it happen?
none none none do not delete block was written by old version.
none none yes, 20s ago do not delete Trash() lost an addToMetadata race, then succeeded in adding a trash marker
none 20s ago yes, 20s ago delete block was written by old version, and is trash.
4w ago 20s ago yes, 20s ago delete block was written by new version, and is trash.
none/anything 20s ago none block cannot be retrieved, and is not included in index, but will never be deleted do we guarantee this can never happen?
30s ago 20s ago yes, 20s ago block was written recently, but will be deleted (!!) do we guarantee this can never happen?
30s ago none yes, 20s ago trash marker will be deleted, data will not be deleted block was written, trashed, and then rewritten

addToMetadata and removeFromMetadata are still prone to races:

"process trash list" goroutine                 "PUT request" handler
--------------------------------------------   --------------------------------------------
                                               get metadata
get metadata
                                               metadata has last_write_at=(old)
set metadata[expires_at]=now+TTL
                                               set metadata[last_write_at]=now
(leave metadata[last_write_at] alone)
                                               write metadata to Azure
write metadata to Azure
                                               success! last_write_at is fresh, so block is safe.
success! last_write_at is old, so block
is trashed / scheduled to be deleted.

In EmptyTrash, if GetBlobProperties() fails because the data blob does not exist, we should delete the trash marker. Otherwise, we'll keep retrying (and logging failures) indefinitely, which seems pointless.

checkTrashed still seems weird. It says "Return NotFoundError if trash marker is found on the block". Shouldn't it return true, nil in this case? That seems to be the way its return values are interpreted by callers. Also,
  • Elsewhere, "trash marker" means the placeholder blob called "trash.exp.hash". This function claims to check the trash marker, but it actually checks the metadata entry.
  • v.translateError(NotFoundError) should be just NotFoundError. Generally, (*AzureBlobVolume)translateError() translates Azure SDK errors: we shouldn't be passing it other errors, like the ones returned by strconv.ParseInt().

if trashed == true should be just if trashed

It seems like nearly every function that uses checkTrashed() already does some other Azure API call that returns metadata, which seems wasteful. E.g., in Mtime:
  1. checkTrashed → GetBlobMetadata
  2. addToMetadata → GetBlobMetadata; SetBlobMetadata

go vet shows some missing arguments for Printf()

#31 Updated by Brett Smith over 5 years ago

  • Target version changed from 2016-05-11 sprint to 2016-05-25 sprint

#32 Updated by Radhika Chippada over 5 years ago

A lot changed in 22db012b

  • Enhanced Azure SDK to support sending extra headers in SetBlobMetadata requests so that we do not have race conditions between two different threads doing trashing and putting.
  • Got rid off the trash marker and last-write-at extra metadata attribute. We now go through all the blobs in storage during empty trash routine

Addressed the comments in note 30:

Mtime needs to return a correct result for a blob that was written to Azure storage by an earlier version of keepstore (i.e., a blob with no last_write_at metadata). Mtime comment should also be updated to match its new behavior.

Reverted back to previous implementation of Mtime since no longer using last-write-at, after checking if trashed and hence this issue is no longer applicable.

We have multiple sources of truth here, and I think it would help to write down the rules about the significance of each state and how it's possible to arrive at each one.

Since we no longer use the trash marker and last_write_at + including If-Match during SetBlobMetadata, we do not have these problems now

addToMetadata and removeFromMetadata are still prone to races:

Addressed with the SDK update to include If-Match during SetBlobMetadata

In EmptyTrash, if GetBlobProperties() fails because the data blob does not exist, we should delete the trash marker. Otherwise, we'll keep retrying (and logging failures) indefinitely, which seems pointless.

Got rid off the trash marker altogether per our discussion

checkTrashed still seems weird. It says "Return NotFoundError if trash marker is found on the block". Shouldn't it return true, nil in this case? That seems to be the way its return values are interpreted by callers.

Updated it

v.translateError(NotFoundError) should be just NotFoundError. Generally, (*AzureBlobVolume)translateError() translates Azure SDK errors: we shouldn't be passing it other errors, like the ones returned by strconv.ParseInt().

Updated it

if trashed == true should be just if trashed

Done

It seems like nearly every function that uses checkTrashed() already does some other Azure API call that returns metadata, which seems wasteful. E.g., in Mtime:

Got rid of the addTo and removeFrom metadata methods and simplified to reuse metadata from the first request as much as possible

go vet shows some missing arguments for Printf()

Corrected these

#33 Updated by Tom Clegg over 5 years ago

This looks better, thanks.

The purpose of the "touch" metadata is to ensure the Last-Modified timestamp really gets updated in Touch() even if the storage service has "don't update Last-Modified if an update does not result in any modifications" logic. It isn't meaningful, and it's the only metadata that could be overwritten during Trash() (right?) so Trash() could skip the GetBlobMetadata call, and just set metadata to map[string]string{"expires_at": fmt.Sprintf(...)}

In EmptyTrash():
  • We should count blocksInTrash++ only for blocks that are trash, not all blocks.
  • We should not need to call GetBlobProperties or GetBlobMetadata to learn whether a block is trash. Instead, we should use Include: "metadata" in ListBlobsParameters, and just rely on the metadata and properties given in the list response.
  • We should fix our stub server so (like the real server) it doesn't return metadata in a ListBlobs response unless asked. That (plus better testing, see below) should reveal that we forgot to use Include: "metadata" in IndexTo, too, which means we will never get any metadata and we will erroneously include trash in the index response.

Why add trashLifetime here? Shouldn't we delete it as soon as it expires? (Also, this should be phrased "if not expired { continue }" like the conditions before it.)

if expiresAt <= time.Now().Add(-1*trashLifetime).Unix()

testDeleteOldBlock looks like it's missing a || !os.IsNotExist(err) ...

       _, err := v.Mtime(TestHash)
       if err == nil {
               t.Fatalf("os.IsNotExist(%v) should have been true", err)

It looks like we aren't testing that a trashed (but not yet deleted) block is omitted by IndexTo. We could do this in testTrashEmptyTrashUntrash's checkGet func, something like this:

        n, err := v.Get(TestHash, buf)
        if err != nil {
            if err2 := v.Touch(TestHash); err2 == nil || !os.IsNotExist(err2) {
                t.Fatalf("Get returned %q but Touch returned %q", err, err2)
            }
            // (try Mtime and IndexTo)
            return err
        }

#34 Updated by Radhika Chippada over 5 years ago

... Trash() could skip the GetBlobMetadata call, and just set metadata to map[string]string{"expires_at": fmt.Sprintf(...)}

Skipped the GetBlobMetadata (which I earlier added for last-write-at)

In EmptyTrash() - We should count blocksInTrash++ only for blocks that are trash, not all blocks.

Corrected this

In EmptyTrash() - We should not need to call GetBlobProperties or GetBlobMetadata to learn whether a block is trash. Instead, we should use Include: "metadata" in ListBlobsParameters, and just rely on the metadata and properties given in the list response.

Oooops. The whole point of updating the Azure lib was for this!! Corrected this

We should fix our stub server so (like the real server) it doesn't return metadata in a ListBlobs response unless asked. That (plus better testing, see below) should reveal that we forgot to use Include: "metadata" in IndexTo, too, which means we will never get any metadata and we will erroneously include trash in the index response.

Updated our stub to honor Include parameter

Why add trashLifetime here? Shouldn't we delete it as soon as it expires? (Also, this should be phrased "if not expired { continue }" like the conditions before it.)

Updated this

testDeleteOldBlock looks like it's missing a || !os.IsNotExist(err) ...

The translateError in S3Volume is not checking for "Not Found" in error and the test fails for S3 volumes. I can update translateError for S3Volume to do more if acceptable.

It looks like we aren't testing that a trashed (but not yet deleted) block is omitted by IndexTo.

The checkGet func also is doing IndexTo (line 803).

#35 Updated by Tom Clegg over 5 years ago

bytesDeleted and bytesInTrash are reported, but aren't actually being counted.

This should be phrased "if expiresAt > time.Now().Unix() { continue }" like the conditions before it, instead of indenting the remainder of the loop:

+                       if expiresAt <= time.Now().Unix() {
+                               err = v.bsClient.DeleteBlob(v.containerName, b.Name, map[string]string{
+                                       "If-Match": b.Properties.Etag,
+                               })
+                               if err != nil {
+                                       log.Printf("EmptyTrash: DeleteBlob(%v): %v", b.Name, err)
+                                       continue
+                               }
+                               blocksDeleted++
+                       }

In the stub server code, in case r.Method == "PUT" && r.Form.Get("comp") == "metadata":, the statement blob.Metadata = make(map[string]string) should not be removed. The Azure server replaces the entire metadata map rather than merging the new values with existing values (right?), so the stub should do the same.

Comment on checkTrashed should be updated (says "Return NotFoundError if..." should be "Return true if...")

This seems unnecessarily verbose, suggest

-                       expiresAtMetadata := b.Metadata["expires_at"]
-                       if expiresAtMetadata == "" {
+                       if b.Metadata["expires_at"] == "" {
                                continue
                        }

I think we should fix the S3 error, if it's as easy as it sounds.

#36 Updated by Radhika Chippada over 5 years ago

bytesDeleted and bytesInTrash are reported, but aren't actually being counted.

Corrected this

This should be phrased "if expiresAt > time.Now().Unix() { continue }"

Updated

In the stub server code, in case r.Method "PUT" && r.Form.Get("comp") "metadata":, the statement blob.Metadata = make(map[string]string) should not be removed.

Updated

Comment on checkTrashed should be updated (says "Return NotFoundError if..." should be "Return true if...")

Updated

This seems unnecessarily verbose, suggest

Updated

I think we should fix the S3 error, if it's as easy as it sounds.

Updated this and updated the testDeleteOldBlock test.

#37 Updated by Tom Clegg over 5 years ago

LGTM, thanks. This branch contains a fair bit of reversed changes so I suggest squashing it into one commit. The easiest way seems to be

cd path_to_arvados
git fetch
git checkout -b 8556-azure-trash origin/master
git diff origin/master...8556-trash-untrash-azure-volume | patch -p1
git commit

This should give you a new branch called 8556-azure-trash with all the same changes as the original branch, but all in one commit, and based on current master.

#38 Updated by Radhika Chippada over 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:aa1d6f9c5e1e21ceedf855bdb2f6db9154f26669.

Also available in: Atom PDF