Project

General

Profile

Actions

Feature #17749

closed

[Keep] avoid AWS S3 request limits -- add option to use more prefixes on S3

Added by Ward Vandewege about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
09/21/2021
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
2.0
Release relationship:
Auto

Description

AWS has a hard request limit of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an Amazon S3 bucket, cf. https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/.

Prefixes are defined as follows (cf. https://aws.amazon.com/premiumsupport/knowledge-center/s3-prefix-nested-folders-difference/):

  A prefix is the complete path in front of the object name, which includes the bucket name. For example,
 if an object (123.txt) is stored as BucketName/Project/WordFiles/123.txt, the prefix is
 “BucketName/Project/WordFiles/”. If the 123.txt file is saved in a bucket without a specified path, the
 prefix value is "BucketName/".

Keep currently does not store its blocks in subdirectories in the S3 buckets it uses. That means the prefix value for all blocks in a particular bucket is "BucketName/", and is subject to the request limits per bucket.

At some point, we may run into the request limits, particularly in a situation where one S3 bucket is shared along many keepstores, e.g. after #16516 is implemented.

The fix would be to use more prefixes in each S3 bucket, perhaps adopting the same pattern keepstore uses when backed by POSIX filesystems.

There is another reason to do this: buckets with very large number of blocks become slow in certain (external) tools like aws sync. Getting a list of all those files on S3-compatible storage can be slow. Having an option to make Keep on S3 use a structure like we do on POSIX disks. Add a config option, default off.

  • Config option could specify where you want the slashes. Prefix length, defaults to zero, three is recommended if you want to enable this feature on S3.
  • The migration path for an existing S3 bucket with data is out of scope (migration could be handled with a script). We could do that in a future story.
  • Will need to update both S3 drivers
  • Same logic would apply to the trash folder in this scenario
  • Will need some new tests

Subtasks 1 (0 open1 closed)

Task #18153: Review 17749-s3-prefixesResolvedTom Clegg09/21/2021

Actions

Related issues

Related to Arvados Epics - Story #16516: Run Keepstore on local compute nodesResolved10/01/202111/30/2021

Actions
Actions #1

Updated by Ward Vandewege about 1 year ago

  • Description updated (diff)
Actions #2

Updated by Ward Vandewege about 1 year ago

  • Description updated (diff)
  • Subject changed from [Keep] investigate AWS S3 request limits to [Keep] avoid AWS S3 request limits
Actions #3

Updated by Ward Vandewege about 1 year ago

  • Related to Story #16516: Run Keepstore on local compute nodes added
Actions #4

Updated by Ward Vandewege about 1 year ago

  • Description updated (diff)
Actions #5

Updated by Ward Vandewege about 1 year ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz 12 months ago

  • Target version deleted (To Be Groomed)
Actions #7

Updated by Ward Vandewege 11 months ago

  • Description updated (diff)
Actions #8

Updated by Ward Vandewege 11 months ago

  • Subject changed from [Keep] avoid AWS S3 request limits to [Keep] avoid AWS S3 request limits -- add option to use more prefixes on S3
Actions #9

Updated by Ward Vandewege 11 months ago

  • Story points set to 2.0
  • Description updated (diff)
Actions #10

Updated by Ward Vandewege 11 months ago

  • Target version set to 2021-09-01 sprint
Actions #11

Updated by Peter Amstutz 11 months ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Actions #12

Updated by Peter Amstutz 10 months ago

  • Target version changed from 2021-09-15 sprint to 2021-09-29 sprint
Actions #13

Updated by Tom Clegg 10 months ago

  • Assigned To set to Tom Clegg
Actions #14

Updated by Tom Clegg 9 months ago

  • Status changed from New to In Progress
  • Category set to Keep

I'm not sure I've made the docs clear enough, particularly the part about when/why not to change PrefixLength.

17749-s3-prefixes @ adccfe35ccc68a865a2fd2356ca2b81e0366a4b4 -- developer-run-tests: #2699

Actions #15

Updated by Ward Vandewege 9 months ago

Tom Clegg wrote:

I'm not sure I've made the docs clear enough, particularly the part about when/why not to change PrefixLength.

17749-s3-prefixes @ adccfe35ccc68a865a2fd2356ca2b81e0366a4b4 -- developer-run-tests: #2699

LGTM, thanks!

Actions #16

Updated by Tom Clegg 9 months ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|5dbf72803717f58b4848b6a6490375450916e84d.

Actions #17

Updated by Peter Amstutz 7 months ago

  • Release set to 42
Actions

Also available in: Atom PDF