Bug #7853

Updated by Tom Clegg over 6 years ago

h2. Background 

 Currently, Data Manager assumes that if two keepstore nodes report the same block B in their indexes, there are two copies of block B stored on two distinct volumes on two distinct nodes. This is not true in the recommended blob storage configuration: multiple keepstore services use a single blob storage volume. 

 For example, in a blob storage configuration with 8 keepstore nodes sharing a single volume, Data Manager will see 8 copies and consider all blocks to be overreplicated. 

 This confusion will also arise if two keepstore servers access the same backing store, which can happen in such a way that their paths are unique (e.g., different local mount points attached to the same NFS mount). 

 A related problem is that volumes could get moved around after data manager makes decisions about what to delete, but before it takes action. This could result in too many replicas being deleted. 

 h2. Resolution 

 Data manager must compare the stored "last PUT" timestamps for each block in the trash list. It must ensure the timestamp on an excess copy does not match the timestamp on any of the still-needed copies. If such a collision occurs, the excess copy cannot be deleted safely. 
 * Data manager _should_ perform some operation that will refresh the timestamp of the still-needed copies. This will allow the excess block to be deleted in a subsequent run. 
 * Data manager _must not_ delete the excess copy. 

 h2. Related issues 

 Not addressed in this story: 
 * If a client writes 1 copy to each of 2 keepstore services that use the same backing store, the client will erroneously conclude that it has achieved replication=2.