Bug #13427
closed
[keep-balance] Handle volumes that are mounted simultaneously by multiple servers
Added by Tom Clegg over 6 years ago.
Updated over 6 years ago.
Release relationship:
Auto
Description
Example:
- keep0 mounts vol0 (rw), vol1 (ro)
- keep1 mounts vol1 (rw), vol0 (ro)
- keep2 mounts vol2 (rw), vol3 (ro)
- keep3 mounts vol3 (rw), vol2 (ro)
This setup is desirable when each block appears on only one backend volume, i.e., when the desired replication level is already provided by the backend. When a single keep server goes down, all blocks are still readable.
However, with this setup, the current keep-balance implementation will never move a block to a better rendezvous position. It sees N readonly replicas and figures there's no point making more copies on different servers: it won't be able to delete the readonly replicas, so making more replicas will result in permanent overreplication. If it pays attention to the device IDs reported by the servers, it could understand that the readonly replicas are just different views of writable replicas it sees elsewhere, and ignore them.
Implementation:
- In (*Balancer)Run(), de-duplicate devices after calling discoverMounts on all services. If the same device ID is reported by both read-only and read/write mounts, drop the read-only mounts entirely.
- In (*Balancer)balanceBlock(), track which devices are going to be used ("wantDev") and treat this like wantMnt: don't try to use the same device twice. (When a device is mounted by multiple servers, we should prefer the one in best rendezvous position, which depends on the block -- so we can't de-duplicate these ahead of time.)
- Target version changed from To Be Groomed to Arvados Future Sprints
- Target version changed from Arvados Future Sprints to 2018-06-06 Sprint
- Assigned To set to Tom Clegg
- Status changed from New to In Progress
- Target version changed from 2018-06-06 Sprint to 2018-06-20 Sprint
13427-multiple-mounts @
da40bd0960806df8e2799e4fb716d41ad08b169f
- fix reported stats (count 1 replica, not 2, if it appears twice on the same device ID at different mounts)
- de-duplicate index calls for RW-mounted devices (retrieve each index once, and apply it to all mounts with the same device ID)
- Status changed from In Progress to Resolved
Also available in: Atom
PDF