Project

General

Profile

Idea #15641

Updated by Tom Clegg almost 5 years ago

When distributing data across multiple volumes in a cloud environment, rendezvous hashing should be based on volume ID: 
 * all keepstore servers access all volumes 
 * client/proxy uses rendezvous hash to sort/choose from the volume(s) in cluster config, and connects to the keepstore server(s) that have access to the chosen volumes 
 * keepstore uses rendezvous hash to sort/choose from the volumes it has access to 
 * keep-balance uses rendezvous hash to choose preferred volume(s) where a blob should be stored, and when pulling/trashing, chooses a random/arbitrary keepstore from the ones that have write access to the relevant volume 

 (Current code uses rendezvous to select a server, and sorts/chooses volumes in random/arbitrary order. This causes unnecessary bottlenecks between clients and buckets (one writable bucket per server / one writing server per bucket) and excessive keepstore-to-backend probing (multiple writable buckets per server).) 

Back