Feature #13126
open[keep] Investigate using signed URLs to delegate access to cloud buckets
Description
Currently keepstore is the gateway to the backend object store. All data has to flow through the keepstores. This is a bottleneck which is usually addressed by ops using more expensive keepstore nodes (to get more bandwidth) or adding keepstore nodes.
Some object storage systems such as S3 have the concept of "signed URLs". This is similar to Arvados signing tokens, a secret which gives time-limited access to read a specific object.
Investigate the performance/scaling behavior of the following alternate flow:
- client requests a block from keepstore
- keepstore receives and validates the request as normal
- keepstore requests a signed URL from backend object store for the block
- keepstore returns 302 Redirect to signed url to client
- client receives redirect and makes a new request to fetch the block content from the signed URL
- client checks block md5sum and proceeds as normal, or tries another keepstore if there is an error
The benefit of this approach is that the data transfer load is moved off keepstore and nodes compute communicate directly with the object store. This should scale better. However, there is also a potential latency penalty in adding the extra "request signed URL and redirect" operation.
On AWS, signed URLs can also be used for PUT operations. AWS permits signed URLs that assert that only data that hashes to a specific MD5 will be accepted. However, keepstore needs to verify the block and return an Arvados signing token, it is not clear how that would work with S3 signed URLs.
Reference:
https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/s3-example-presigned-urls.html