Story #7393

[Keep] Prototype S3 blob storage

Added by Brett Smith almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
09/23/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
5.0

Description

The prototype should implement the Keep volume interface using S3 blob storage, including returning errors that are required to report problems in the underlying storage.

The prototype does not need to deal with non-essential errors like configuration problems, temporary network hiccups, etc.

Ideally the prototype will be developed in such a way there's a clear path for further development can make it production-ready. However, in case of doubt or conflict, getting the prototype done in a timely manner to prove the concept overrides this concern.

The branch review should ensure that the prototype meets functionality requirements, and can meet known scalability requirements in production use. It doesn't need to address code style, issues with tests (although ideas for tests are good to capture), etc.

Make sure the implementation can accommodate S3-compatible endpoints other than Amazon S3 proper. But it's OK if, in the first implementation, only Amazon S3 is supported/tested.

Refs

Subtasks

Task #7924: Review 7393-s3-volumeResolvedTom Clegg

Task #7921: write tests using stub APIResolvedTom Clegg

Task #7922: Add S3 volume type and cmd line argsResolvedTom Clegg

Task #7923: Test against real S3 service using keepexerciseResolvedTom Clegg


Related issues

Related to Arvados - Story #7988: [Keep] Single keepstore responsible for trash lists on S3Closed

Blocks Arvados - Story #7934: [Keep] Test S3 block storage on AWSResolved02/01/2016

Blocks Arvados - Story #7935: [Keep] Test S3 block storage on GCEResolved

Blocks Arvados - Story #7936: [Keep] Test S3 block storage on Ceph Rados gatewayNew

Associated revisions

Revision 7d5d57a5
Added by Tom Clegg over 3 years ago

Merge branch '7393-s3-volume' closes #7393

History

#1 Updated by Brett Smith almost 4 years ago

  • Target version set to 2015-12-02 sprint

#2 Updated by Brett Smith almost 4 years ago

  • Target version changed from 2015-12-02 sprint to Arvados Future Sprints

#3 Updated by Tom Clegg almost 4 years ago

  • Description updated (diff)

#4 Updated by Tom Clegg over 3 years ago

  • Assigned To set to Tom Clegg
  • Target version changed from Arvados Future Sprints to 2015-12-16 sprint

#5 Updated by Brett Smith over 3 years ago

  • Status changed from New to In Progress

#6 Updated by Tom Clegg over 3 years ago

7393-s3-volume is at 069704e with the following known issues that (I think) we can merge with:

Delete-vs.-write race

The delete-vs.-write race is not handled. It is possible to write (refresh) an existing block between the time "T0" when the delete handler confirms that the block is old and the time "T1" when the block actually gets deleted. When this happens, PUT reports success even though the block gets deleted right away.

(Aside: AWS does not guarantee the block actually becomes ungettable before "delete" returns, so "T1" can be even later than when keepstore finishes its delete method.)

Current workarounds:
  • If you want to be safe and don't mind not having garbage collection, you're fine; delete is disabled by default.
  • If you want to do garbage collection and you aren't worried about the race, turn on -s3-unsafe-delete.

Odd error messages

AWS reports "access denied" instead of 404 when trying to read a nonexistent block during Compare and Get.
  • 2015/12/09 20:56:07 s3-bucket:"4xphq-keep": Compare(637821cc1c31b89272a25c1a6885cc8e): Access Denied
    

This might just be a problem in the way we've set up our test bucket permissions, though. The s3test stub server throws 404 as expected so we pass the "report notfound as ErrNotExist" tests.

No docs

...other than the keepstore -help message.

Non-Amazon endpoints are untested

The options are there (-s3-endpoint) for using a non-AWS S3-compatible service like Google Storage, but the only services I've tried it on are AWS and the s3test server from https://github.com/AdRoll/goamz.

#7 Updated by Tom Clegg over 3 years ago

  • Description updated (diff)

#8 Updated by Peter Amstutz over 3 years ago

Tom Clegg wrote:

7393-s3-volume is at 069704e with the following known issues that (I think) we can merge with:

Delete-vs.-write race

The delete-vs.-write race is not handled. It is possible to write (refresh) an existing block between the time "T0" when the delete handler confirms that the block is old and the time "T1" when the block actually gets deleted. When this happens, PUT reports success even though the block gets deleted right away.

(Aside: AWS does not guarantee the block actually becomes ungettable before "delete" returns, so "T1" can be even later than when keepstore finishes its delete method.)

Current workarounds:
  • If you want to be safe and don't mind not having garbage collection, you're fine; delete is disabled by default.
  • If you want to do garbage collection and you aren't worried about the race, turn on -s3-unsafe-delete.

I spent a while reading the S3 documentation. The correct way to do this seems to be to enable versioning on the bucket. Then the head-and-delete operation will only delete the specific version of the object. This should solve the race because if there is a PUT or PUT-copy it will show up as a more recent version. As a side effect the "PUT-copy" operation used for Touch() may need to explicitly delete the old version.

#9 Updated by Peter Amstutz over 3 years ago

Object versioning in S3 compatible APIs:

Google:

Has a "generation" parameter that is very similar to Amazon's "versionId", except that it's a 64 bit integer where S3 uses a string.

https://cloud.google.com/storage/docs/object-versioning?hl=en

Ceph:

"x-amz-version-id" is listed under "Unsupported header fields" and no mention of versioning in the documentation.

http://docs.ceph.com/docs/master/radosgw/s3/

#10 Updated by Peter Amstutz over 3 years ago

s3_volume_test has some commented out code.

#11 Updated by Peter Amstutz over 3 years ago

(01:43:43 PM) Walex: gah, I actually came back before I forget, to say something obvious but that may be useful: the "standard" way to avoid this problem in distributed filesystems is to allow data operations to be done by any "keepstore", but to get all metadata operations to be done only by one "keepstore", e.g. the one "with the lowest IP address" as in AFS, or the one that managed first to acquire a certain "well known" lock. You could use that for Ceph but not the other syste
(01:48:33 PM) tetron_: Walex: actually, that's a great idea
(01:48:40 PM) tetron_: Walex: you're probably gone now
(01:48:58 PM) tetron_: Walex: but yea, we could have 1 writable server and N read-only servers

Would require some locking between between trash list and PUT handler in keepstore itself (maybe a another story).

#12 Updated by Peter Amstutz over 3 years ago

One detail to check:

The S3 documentation for PUT-copy specifies:

x-amz-copy-source: /source_bucket/sourceObject

However the code constructs this string:

v.Bucket.Name+"/"+loc

Is the first '/' being added somewhere, or is S3 accepting it without the leading slash?

#13 Updated by Tom Clegg over 3 years ago

Peter Amstutz wrote:

Is the first '/' being added somewhere, or is S3 accepting it without the leading slash?

Interesting. The goamz s3 package leaves out the leading '/', s3test doesn't tolerate one, and amazon seems to add it implicitly if you leave it off (keep-exercise did lots of "touch" operations without any trouble)... I'd say this should be fixed in the SDK first, and then (depending on how the SDK fixes it) we should update our code.

s3_volume_test has some commented out code.

Whoops, removed. Thanks.

#14 Updated by Tom Clegg over 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:7d5d57a522489209e6b3cecfef94bab0aae4a7f5.

Also available in: Atom PDF