Actions
Keep-balance¶
Keep-balance enforces policies and generates reports about storage resource usage. It interacts with the Keep server and the metadata database (through the API server). Clients/users do not interact with keep-balance directly.
See also:- Keep server
- Keep manifest format
- source:services/keep-balance
- http://doc.arvados.org/install/install-keep-balance.html
- Garbage collector: decide what is eligible for deletion (and some partial order of preference)
- Replication enforcer: copy and delete blocks in various backing stores to achieve desired replication level
- Rebalancer: move blocks to redistribute free space and reduce client probes
- for managers: how much disk space is being conserved due to CAS
- for managers: how much disk space is occupied in a given backing store service
- for managers: how disk usage would be affected by modifying storage policy
- for managers: how much disk space+time is used (per user, group, node, disk)
- for users: when replication/policy specified for a collection is not currently satisfied (and why, for how long, etc)
- for users: how much disk space is represented by a given set of collections
- for users: how much disk space can be made available by garbage collection
- for users: how soon they should expect their cached data to disappear
- for users: performance statistics (how fast should I expect my job to read data?)
- for ops: where each block was most recently read/written, in case data recovery is needed
- for ops: how unbalanced the backing stores are across the cluster
- for ops: activity level and performance statistics
- for ops: activity level vs. amount of space (how much of the data is being accessed by users?)
- for ops: disk performance/error/status trends (and SMART reports) to help identify bad hardware
- for ops: history of disk adds, removals, moves
- Which blocks are used by which collections (and which collections are valued by which users/groups)
- Which blocks are stored in which services (local Keep, remote Keep, other storage service)
- Which blocks are stored on which disks
- Which disks are attached to which nodes
- Aggregate read/write activity per block and per disk (where applicable, e.g., block stored in local Keep)
- Exceptions (checksum mismatch, IO error)
Implementation considerations¶
Overview- REST service for queries
- All requests require authentication. Token validity verified against Metadata server, and cached locally.
- Subscribes to system event log
- Connects to metadata server (has a system_user token), at least periodically, to ensure eventual consistency with metadata DB's idea of what data is important
- Persistent database
- In-memory database
- Easy to run multiple keep index services.
- Most features do not need synchronous operation / real time data.
- Features that move or delete data should be tied to a single "primary" indexing service (failover event likely requires resetting some state).
- Substantial disagreement between multiple index services should be easy to flag on admin dashboard.
Updated by Tom Clegg over 7 years ago · 7 revisions