Story #16107
Updated by Nico César over 1 year ago
Requirements and high level design
Each collection has one or more desired storage classes. The Keep blocks making up the collection inherit the storage classes from the collection.
* Each keep volume is assigned one or more storage classes
* The system asynchronously copies/moves blocks between Keep volumes to fulfill the desired storage classes for each block.
When uploading a block, the Python and Go SDKs support passing a list of storage classes to influence placement of the data block.
Each data block will be stored on volumes that fulfill the desired storage classes.
* A block that has two or more storage classes may be fulfilled by a single volume that fulfills all storage classes, or multiple volumes for each storage class.
* [#17349] The “replicas_desired” field expresses the number of replicas per storage class.
** A single volume can be configured to count as multiple replicas (existing behavior)
** A block assigned to two storage classes with N=2 replicas could have 2, 3 or 4 actual copies. For example, a block requesting storage classes A and B could be written to two volumes with storage classes “A and B”, or two “A” volumes and two “B” volumes, or “A”, “B” and “A,B”.
** If sufficient replicas cannot be written for each storage class during upload, it is a fatal error (existing behavior)
* Storage classes are advisory and do not guarantee that a block will only be stored on a certain class or will never be stored on a certain class. Some circumstances where blocks may not be stored on only the requested storage class:
** User has specified an impossible situation, such as changing a collection to a storage class that isn’t fulfilled by any volume
** The same block may be referenced by multiple collections with different storage classes
** A storage class for a collection may have been changed but the system has not caught up to it yet
** In these cases the block will remain on its original storage volume
** There will be a reporting tool (keep-balance or other) that reports when there is a mismatch between the desired and actual storage classes for a block. The tool will also provide a way to report which blocks (associated collections) on a certain storage class also exist on a different storage class.
* When writing a block, the desired storage classes are passed in the keep service request
* Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service.
* When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup.
* There is always a “default” storage class. It is an error if there is not at least one volume with the “default” class.
** If an upload doesn’t specify a storage class it will use the ‘default’ storage class
* Container and container request records gain a field specifying the storage class of the output collection.
* The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run
** When writing a block, the desired storage classes are passed in the keep service request
** Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service.
** When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup.
** There is always a “default” storage class. It is an error if there is not at least one volume with the “default” class.
** If an upload doesn’t specify a storage class it will use the ‘default’ storage class
** Container and container request records gain a field specifying the storage class of the output collection.
** The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run
* Workbench 2 collection view will display the storage class of the collection.
* Work will be tested with unit, functional and integration testing using Amazon S3
Each collection has one or more desired storage classes. The Keep blocks making up the collection inherit the storage classes from the collection.
* Each keep volume is assigned one or more storage classes
* The system asynchronously copies/moves blocks between Keep volumes to fulfill the desired storage classes for each block.
When uploading a block, the Python and Go SDKs support passing a list of storage classes to influence placement of the data block.
Each data block will be stored on volumes that fulfill the desired storage classes.
* A block that has two or more storage classes may be fulfilled by a single volume that fulfills all storage classes, or multiple volumes for each storage class.
* [#17349] The “replicas_desired” field expresses the number of replicas per storage class.
** A single volume can be configured to count as multiple replicas (existing behavior)
** A block assigned to two storage classes with N=2 replicas could have 2, 3 or 4 actual copies. For example, a block requesting storage classes A and B could be written to two volumes with storage classes “A and B”, or two “A” volumes and two “B” volumes, or “A”, “B” and “A,B”.
** If sufficient replicas cannot be written for each storage class during upload, it is a fatal error (existing behavior)
* Storage classes are advisory and do not guarantee that a block will only be stored on a certain class or will never be stored on a certain class. Some circumstances where blocks may not be stored on only the requested storage class:
** User has specified an impossible situation, such as changing a collection to a storage class that isn’t fulfilled by any volume
** The same block may be referenced by multiple collections with different storage classes
** A storage class for a collection may have been changed but the system has not caught up to it yet
** In these cases the block will remain on its original storage volume
** There will be a reporting tool (keep-balance or other) that reports when there is a mismatch between the desired and actual storage classes for a block. The tool will also provide a way to report which blocks (associated collections) on a certain storage class also exist on a different storage class.
* When writing a block, the desired storage classes are passed in the keep service request
* Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service.
* When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup.
* There is always a “default” storage class. It is an error if there is not at least one volume with the “default” class.
** If an upload doesn’t specify a storage class it will use the ‘default’ storage class
* Container and container request records gain a field specifying the storage class of the output collection.
* The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run
** When writing a block, the desired storage classes are passed in the keep service request
** Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service.
** When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup.
** There is always a “default” storage class. It is an error if there is not at least one volume with the “default” class.
** If an upload doesn’t specify a storage class it will use the ‘default’ storage class
** Container and container request records gain a field specifying the storage class of the output collection.
** The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run
* Workbench 2 collection view will display the storage class of the collection.
* Work will be tested with unit, functional and integration testing using Amazon S3