Feature #13126
open[keep] Investigate using signed URLs to delegate access to cloud buckets
Description
Currently keepstore is the gateway to the backend object store. All data has to flow through the keepstores. This is a bottleneck which is usually addressed by ops using more expensive keepstore nodes (to get more bandwidth) or adding keepstore nodes.
Some object storage systems such as S3 have the concept of "signed URLs". This is similar to Arvados signing tokens, a secret which gives time-limited access to read a specific object.
Investigate the performance/scaling behavior of the following alternate flow:
- client requests a block from keepstore
- keepstore receives and validates the request as normal
- keepstore requests a signed URL from backend object store for the block
- keepstore returns 302 Redirect to signed url to client
- client receives redirect and makes a new request to fetch the block content from the signed URL
- client checks block md5sum and proceeds as normal, or tries another keepstore if there is an error
The benefit of this approach is that the data transfer load is moved off keepstore and nodes compute communicate directly with the object store. This should scale better. However, there is also a potential latency penalty in adding the extra "request signed URL and redirect" operation.
On AWS, signed URLs can also be used for PUT operations. AWS permits signed URLs that assert that only data that hashes to a specific MD5 will be accepted. However, keepstore needs to verify the block and return an Arvados signing token, it is not clear how that would work with S3 signed URLs.
Reference:
https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/s3-example-presigned-urls.html
Updated by Peter Amstutz over 6 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 6 years ago
- Subject changed from Investigate using signed URLs to delegate access to cloud buckets to [keep] Investigate using signed URLs to delegate access to cloud buckets
- Description updated (diff)
- Status changed from In Progress to New
Updated by Tom Morris over 6 years ago
Rather than starting with an answer, I'd like to see us start with a question or problem statement. I'm my mind the goal is to remove all bottlenecks in accessing the storage layer. All cloud vendors provide highly scalable storage fabrics with reliable transport, integrity checksums, and permission mechanisms. To the extent that we can, we should be leverage those capabilities rather than duplicating them.
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
To Be Groomed)