Project

General

Profile

Bug #7167

Updated by Tom Clegg over 8 years ago

This is a script aimed at system administrators who are migrating a cluster from one installation to another.    It copies Keep data from the old to the new cluster, in a way that's efficient both for the migration itself, and for accessing data on the destination cluster (in other words, blocks live on services early in their rendezvous hash order). 

 h2. Functional requirements requirements: 

 * The script dynamically finds all blocks available on the source cluster. 
 * Get each block from the source cluster exactly once, and write it to the destination cluster, using standard Keep APIs and algorithms (e.g., rendezvous hashing). 
 * Include a checkpointing mechanism so that if the process is interrupted, it has a record of what blocks have already been copied and doesn't re-send them. 

 TBD: Should this script do something to determine the desired replication level, and write to the destination cluster based on that information?    Or should we just write one or two copies of every block, and let Data Manager adjust replication from there? 

 h2. Implementation 

 keep-rsync 
 * Accepts "src" and "dst" arguments and reads settings/conf files just like arv-copy. 
 * Accepts command line arguments for (or reads from settings files) source and destination data manager key and blob signing key. These are necessary to get all indexes and data blocks respectively. 
 * Accepts replication argument (default to whatever is advertised in "destination" discovery doc). 
 * Accepts a "prefix" argument that passes through to index requests on both sides. This makes it possible to divide the work into (e.g.) 16 asynchronous jobs, one for each hex digit. 
 * Gets indexes from the source and destination keepstores[1]. 
 * Gets data from source keepstores/keepproxy, stores in destination using configured replication level. 
 * Uses regular SDK functions to get and put blocks. 
 * Displays progress. 
 ** "getting indexes: 10... 9... [...]" (count down number of indexes todo) 
 ** "copying data block 1 of 1234 (0% done, ETA 2m3s): acbd18db4cc2f85cedef654fccc4a4d8+3" 

 h3. Example 

 How to use in a migration: 
 * Turn off data manager on destination cluster. 
 * Run keep-rsync. 
 * Disable access to source cluster. 
 * Dump database and restore to destination cluster. 
 * Run keep-rsync again. 

Back