Project

General

Profile

Keep » History » Revision 8

Revision 7 (Anonymous, 04/12/2013 05:02 PM) → Revision 8/26 (Tom Clegg, 04/12/2013 05:32 PM)

h1. Keep 

 Keep is a distributed content-addressable storage system designed for high performance in I/O-bound cluster environments. 

 Notable design goals and features include: 

 * High scalability 
 * Node-level redundancy 
 * Maximum overall throughput in a busy cluster environment 
 * Maximum data bandwidth from client to disk 
 * Minimum transaction overhead 
 * Elimination of disk thrashing (commonly caused by multiple simultaneous readers) 
 * Client-controlled redundancy 

 h2. Design 

 The above goals are accomplished by the following design features. 

 * Data are transferred directly between the client and the physical node where the disk is connected. 
 * Data collections are encoded in large (≤64 MiB) blocks to minimize short read/write operations. 
 * Each disk accepts only one block-read/write operation at a time. This prevents disk thrashing and maximizes total throughput when many clients compete for a disk. 
 * Storage redundancy is directly controlled, and can be easily verified, by the client simply by reading or writing a block of data on multiple nodes. 
 * Data block distribution is computed based on a cryptographic digest of the data block being stored or retrieved. This eliminates the need for a central or synchronized database of block storage locations. 

 h2. Components 

 The Keep storage system consists of data block read/write services, SDKs, and management agents. 

 The responsibilities of the Keep service are: 

 * Write data blocks 
 * When writing: ensure data integrity by comparing client-supplied cryptographic digest and data 
 * Read data blocks (subject to permission, which is determined by the system/metadata DB) 
 * Send read/write/error event logs to management agents 

 The responsibilities of the SDK are: 

 * When writing: split data into ≤64 MiB chunks 
 * When writing: encode directory trees as manifests 
 * When writing: write data to the desired number of nodes to achieve storage redundancy 
 * After writing: register a collection with Arvados 
 * When reading: parse manifests 
 * When reading: verify data integrity by comparing locator to MD5 digest of retrieved data 

 The responsibilities of management agents are: 

 * Verify validity of permission tokens 
 * Determine which blocks have higher or lower redundancy than required 
 * Monitor disk space and move or delete blocks as needed 
 * Collect per-user, per-group, per-node, and per-disk usage statistics 

 

 h2. Benefits  

 Keep offers a variety of major benefits over POSIX file systems and other object file storage systems. This is a summary of some of those benefits:  

 * *Elimination of Duplication* — - One of the major storage management problems today is the duplication of data. Often researchers will make copies of data for backup or to re-organize files for different projects. Content addressing automatically eliminates unnecessary duplication: if duplication because Keep checks when a program saves a file when is saved whether an identical file has already been stored, Keep simply reports success without having exists. If it does, there is no reason to write make a second copy. 

 * *Canonical Records* — - Content addressing creates clear and verifiable canonical records for files. By combining Keep with the computation system in Arvados, it becomes trivial to verify the exact file that was used for a computation. By using a collection to define an entire data set (which could be 100s of terabytes or petabytes), you maintain a permanent and verifiable record of which data were used for each computation. The file that defines a collection is very small relative to the underlying data, so you can make as many as you need.  

 * *Provenance* — - The combination of Keep and the computation system make it possible to maintain clear provenance for all the data in the system. This has a number of benefits including making it easy to ascertain how data were derived at any point in time.  

 * *Easy Management of Temporary Data* — One benefit - As a result of systematic provenance tracking is that Arvados provenance, the system can automatically easily manage temporary and intermediate data. If you know how a data set or file is was created, you can decide whether if it is worthwhile worth it to keep a copy on disk. that file in storage. Knowing what pipeline was run on which input data, how long it took, etc., makes it possible data to automate such decisions. 

 create gives you that visibility.  

 * *Flexible Organization* — - In Arvados, files are grouped in collections and Keep can be easily tagged with metadata. Different researcher and The roadmap is that it will be possible to create separate namespaces for metadata for different researchers or research teams can manage independent sets of metadata. teams. This makes it possible for researchers to organize files in a variety of different ways without duplicating or physically moving the data. Datasets are defined by creating collections. A collection is represented by a text file, which lists file that stores pointers to other files using the filenames content addresses, and data blocks comprising the collection, and is itself stored in Keep. each collection has it's own address. As a result, the same underlying data can be referenced by many described in a virtually unlimited number of different collections, collections without ever copying or moving every re-organizing the data itself. 

 physical structure of the data.  

 * *High Reliability* — *High-Reliability* - By combining content addressing with an object file store, Keep is fault tolerant across drive and even node failures. failures in the system. The Data Manager monitors the replication level of each data collection. Storage redundancy can thus be adjusted according to control how many replications are maintained in an Arvados cluster depending on the relative importance of individual datasets the data and whether or not it has been backed up in addition to default site policy. 

 another system.  

 * *Easier Tiering of Storage* — - The Data Manager in Arvados manages will be able to manage the distribution of files to storage systems such as a NAS or cloud back up service. The Because the files are all content addressed and tracked in the metadata database: when base, if a pipeline uses data is run that asks for a file which is not on the cluster, Arvados can automatically pause that pipeline and move the necessary data onto on to the cluster before starting for the job. This makes tiered tiering storage feasible without imposing an undue burden on end users. transparent to end-users eliminating many of the challenges associated with convincing researchers to re-organize there data. 

 * *Security and Access Control* — - Keep can encrypt files on disk and this storage architecture makes the implementation of very fine grained grain access control significantly easier than traditional POSIX file systems.  

 * *POSIX Interface* — - While it's a slower interface, it is possible to mount collections in Keep can be mounted as a POSIX filesystems drives in a virtual machine in order to access data with in a way that existing tools that expect a POSIX interface. may expect. Because collections are so flexible, one can could easily create many different virtual directory structures for the same underlying files without copying or even reading ever physically moving the underlying data. Combining files. Over time we think shifting to the native Arvados tools with UNIX pipes provides APIs and SDK will be a better performance, solution than mounting virtual POSIX drives, but this approach will ease the POSIX mount option is more convenient in some situations. 

 transition.   

 * *Data Sharing* — - Keep makes it much easier to share data between clusters in different data centers and or organizations. Keep When files are content addresses include addressed the address includes information about which the cluster data is stored on. With where the file lives. Clusters can be federated clusters, so it's possible define collections of data can that reside on multiple clusters, and distribution of which makes it possible to distribute computations across clusters can eliminate slow, costly data transfers. 
 clusters, eliminating the need to physically move the data.