Project

General

Profile

Idea #16107

Updated by Nico César almost 4 years ago

Requirements and high level design 

 Each collection has one or more desired storage classes.    The Keep blocks making up the collection inherit the storage classes from the collection. 
 * Each keep volume is assigned one or more storage classes 
 * The system asynchronously copies/moves blocks between Keep volumes to fulfill the desired storage classes for each block. 


 When uploading a block, the Python and Go SDKs support passing a list of storage classes to influence placement of the data block. 
 Each data block will be stored on volumes that fulfill the desired storage classes. 

 * A block that has two or more storage classes may be fulfilled by a single volume that fulfills all storage classes, or multiple volumes for each storage class. 
 * [#17349] The “replicas_desired” field expresses the number of replicas per storage class.   
 ** A single volume can be configured to count as multiple replicas (existing behavior) 
 ** [#17350 for keep-balance]    A block assigned to two storage classes with N=2 replicas could have 2, 3 or 4 actual copies.    For example, a block requesting storage classes A and B could be written to two volumes with storage classes “A and B”, or two “A” volumes and two “B” volumes, or “A”, “B” and “A,B”. 
 ** If sufficient replicas cannot be written for each storage class during upload, it is a fatal error (existing behavior) 

 * Storage classes are advisory and do not guarantee that a block will only be stored on a certain class or will never be stored on a certain class.    Some circumstances where blocks may not be stored on only the requested storage class: 
 ** User has specified an impossible situation, such as changing a collection to a storage class that isn’t fulfilled by any volume 
 ** The same block may be referenced by multiple collections with different storage classes 
 ** A storage class for a collection may have been changed but the system has not caught up to it yet 
 ** In these cases the block will remain on its original storage volume 
 ** There will be a reporting tool (keep-balance or other) that reports when there is a mismatch between the desired and actual storage classes for a block.    The tool will also provide a way to report which blocks (associated collections) on a certain storage class also exist on a different storage class. 

 * When writing a block, the desired storage classes are passed in the keep service request  
 * Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service. 
 * When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup. 
 * There is always a “default” storage class.    It is an error if there is not at least one volume with the “default” class. 
 ** If an upload doesn’t specify a storage class it will use the ‘default’ storage class 
 * Container and container request records gain a field specifying the storage class of the output collection. 
 * The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run 
 ** When writing a block, the desired storage classes are passed in the keep service request  
 ** Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service. 
 ** When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup. 
 ** There is always a “default” storage class.    It is an error if there is not at least one volume with the “default” class. 
 ** If an upload doesn’t specify a storage class it will use the ‘default’ storage class 
 ** Container and container request records gain a field specifying the storage class of the output collection. 
 ** The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run 
 * Workbench 2 collection view will display the storage class of the collection. 
 * Work will be tested with unit, functional and integration testing using Amazon S3 

Back