Feature #17749

Updated by Ward Vandewege 4 months ago

AWS has a hard request limit of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second *per prefix* in an Amazon S3 bucket, cf. https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/.

Prefixes are defined as follows (cf. https://aws.amazon.com/premiumsupport/knowledge-center/s3-prefix-nested-folders-difference/):

A prefix is the complete path in front of the object name, which includes the bucket name. For example,
if an object (123.txt) is stored as BucketName/Project/WordFiles/123.txt, the prefix is
“BucketName/Project/WordFiles/”. If the 123.txt file is saved in a bucket without a specified path, the
prefix value is "BucketName/".

Keep currently does not store its blocks in subdirectories in the S3 buckets it uses. That means the prefix value for all blocks in a particular bucket is "BucketName/", and is subject to the request limits per bucket.

At some point, we may run into the request limits, particularly in a situation where one S3 bucket is shared along many keepstores, e.g. after #16516 is implemented.

The fix would be to use more prefixes in each S3 bucket, perhaps adopting the same pattern keepstore uses when backed by POSIX filesystems.

There is another reason to do this: buckets with very large number of blocks become slow in certain (external) tools like aws sync. Getting a list of all those files on S3-compatible storage can be slow. Having an option to make Keep on S3 use a structure like we do on POSIX disks.