Keep-balance » History » Revision 4
Revision 3 (Anonymous, 04/17/2013 02:50 PM) → Revision 4/7 (Tom Clegg, 02/14/2014 10:22 AM)
h1. Data Manager The Data Manager enforces policies concerning data access and generates reports about storage resource usage. consumption. The Data Manager interacts primarily works with [[Keep]], the [[Keep server]] and the metadata database. Clients/users do not interact with the Data Manager directly. content addressable distributed file system, to provide additional data management functionality. See also: * [[Keep server]] * [[Keep manifest format]] * source: n/a (design phase) Its responsibilities include: Responsibilities: * Garbage collector: decide what is eligible for deletion (and some partial order of preference) * Replication enforcer: copy and delete blobs in various backing stores Granting access to achieve desired replication level * Rebalancer: move blobs to redistribute free space and reduce client probes * Tell managers how much disk space is being conserved due to CAS * Tell managers how much disk space is occupied in a given backing store service * Tell managers how disk usage would be affected by modifying storage policy * Tell managers how much disk space+time is used (per user, group, node, disk) * Tell users when replication/policy specified for a collection is not currently satisfied (and why, for how long, etc) * Tell users how much disk space is represented by a given set of collections * Tell users how much disk space can be made available by garbage collection * Tell users how soon they should expect their cached data blocks (according to disappear * Tell users performance statistics (how fast should I expect my job to read data?) * Tell ops where each block was most recently read/written, permissions found in case data recovery is needed * Tell ops how unbalanced the backing stores are across the cluster system/metadata DB) * Tell ops activity level Monitoring and performance statistics * Tell ops activity level vs. amount adjusting redundancy of space (how much of the valuable data is being accessed by users?) * Tell ops disk performance/error/status trends (and SMART reports) blocks (according to help identify bad hardware targets found in system/metadata DB) * Tell ops history of disk adds, removals, moves Basic kinds of Deleting unneeded data in the index: * Which blocks are used by which collections (and which collections are valued by which users/groups) to increase disk space * Which Rebalancing data blocks are stored on which among available disks * Which disks are attached to which nodes Monitoring block read and write activity * Read events Monitoring free disk space * Write events Monitoring disk activity * Exceptions (checksum mismatch, IO error) h2. Implementation considerations Overview Monitoring disk performance and health (via SMART and empirical measurement) * REST service Tracking physical disk changes (disks added, removed, moved) * API server may cache/proxy some queries * API server may redirect some queries Permissions * Support +A tokens like [[Keep server]] when accepting collection/blob uuids in request? * Require admin api_token for some queries, site-configurable? Distributed/asynchronous * Easy to run multiple keep index services. * Most features do not need synchronous operation / real time data. * Features that move or delete data should be tied to a single "primary" indexing service (failover event likely requires resetting some state). * Substantial disagreement between multiple index services should be easy to flag on admin dashboard. Collecting per-user, per-group, per-node, and per-disk usage statistics