Project

General

Profile

Actions

Idea #16107

closed

Storage classes

Added by Ward Vandewege almost 5 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Target version:
-
Start date:
03/01/2021
Due date:
09/30/2021
Story points:
-
Release:
Release relationship:
Auto

Description

Requirements and high level design

Each collection has one or more desired storage classes. The Keep blocks making up the collection inherit the storage classes from the collection.
  • Each keep volume is assigned one or more storage classes
  • The system asynchronously copies/moves blocks between Keep volumes to fulfill the desired storage classes for each block.

When uploading a block, the Python and Go SDKs support passing a list of storage classes to influence placement of the data block.
Each data block will be stored on volumes that fulfill the desired storage classes.

  • A block that has two or more storage classes may be fulfilled by a single volume that fulfills all storage classes, or multiple volumes for each storage class.
  • [#17349] The “replicas_desired” field expresses the number of replicas per storage class.
    • A single volume can be configured to count as multiple replicas (existing behavior)
    • [#17350 for keep-balance] A block assigned to two storage classes with N=2 replicas could have 2, 3 or 4 actual copies. For example, a block requesting storage classes A and B could be written to two volumes with storage classes “A and B”, or two “A” volumes and two “B” volumes, or “A”, “B” and “A,B”.
    • If sufficient replicas cannot be written for each storage class during upload, it is a fatal error (existing behavior)
  • Storage classes are advisory and do not guarantee that a block will only be stored on a certain class or will never be stored on a certain class. Some circumstances where blocks may not be stored on only the requested storage class:
    • User has specified an impossible situation, such as changing a collection to a storage class that isn’t fulfilled by any volume
    • The same block may be referenced by multiple collections with different storage classes
    • A storage class for a collection may have been changed but the system has not caught up to it yet
    • In these cases the block will remain on its original storage volume
    • There will be a reporting tool (keep-balance or other) that reports when there is a mismatch between the desired and actual storage classes for a block. The tool will also provide a way to report which blocks (associated collections) on a certain storage class also exist on a different storage class.
  • When writing a block, the desired storage classes are passed in the keep service request
  • [#17389] Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service.
  • When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup.
  • There is always a “default” storage class. It is an error if there is not at least one volume with the “default” class.
    • If an upload doesn’t specify a storage class it will use the ‘default’ storage class
  • Container and container request records gain a field specifying the storage class of the output collection.
  • [#17351][#17390] The following Arvados component swill gain support for specifying storage classes for data upload: arv-put, arvados-cwl-runner, crunch-run
    • When writing a block, the desired storage classes are passed in the keep service request
    • Keepproxy will understand storage classes and forwards blocks to the appropriate keepstore service.
    • When reading a collection, the Python and Go SDK support passing a list of storage classes to inform block lookup.
    • There is always a “default” storage class. It is an error if there is not at least one volume with the “default” class.
    • If an upload doesn’t specify a storage class it will use the ‘default’ storage class
    • Container and container request records gain a field specifying the storage class of the output collection.
  • Workbench 2 collection view will display the storage class of the collection.
  • Work will be tested with unit, functional and integration testing using Amazon S3

Related issues 24 (0 open24 closed)

Related to Arvados - Feature #17349: [Keep API] Revisit "replicas_desired" and "storage_classes_desired".RejectedActions
Related to Arvados - Feature #17350: [keep-balance] Expected behaviour with different "replicas_desired" and "storage_classes_desired" valuesRejectedActions
Related to Arvados - Feature #17351: [arv-put] Storage classesResolvedLucas Di Pentima06/03/2021Actions
Related to Arvados - Feature #17388: [arv-copy] Storage classes revisitResolvedLucas Di Pentima07/23/2021Actions
Related to Arvados - Feature #17389: Storage classes support in keepproxyResolvedLucas Di Pentima06/18/2021Actions
Related to Arvados - Bug #17390: Set storage classes for intermediates and final outputsResolvedPeter Amstutz08/12/2021Actions
Related to Arvados - Feature #17391: [keepstore] expose the volume storage classes ClosedActions
Related to Arvados - Feature #17392: Support writing blocks to correct storage classes in Go SDKResolvedTom Clegg04/12/2021Actions
Related to Arvados - Feature #13382: [keepstore] Write new blocks to appropriate storage classResolvedTom Clegg04/02/2021Actions
Related to Arvados - Feature #17394: Go SDK CollectionFS writes files to correct storage classResolvedTom Clegg07/15/2021Actions
Related to Arvados - Feature #17395: Control storage class of container / container_request outputResolvedPeter Amstutz06/30/2021Actions
Related to Arvados - Feature #17393: Go and Python SDK propagate correct storage class to keepstoreDuplicateActions
Related to Arvados - Support #17447: Scoping/grooming storage classesResolvedTom CleggActions
Related to Arvados - Idea #17465: Support writing blocks to correct storage classes in Python SDKResolvedLucas Di Pentima06/01/2021Actions
Related to Arvados - Feature #11184: [Keep] Support multiple storage classesResolvedTom MorrisActions
Related to Arvados - Feature #17572: arv-mount understands storage classesResolvedLucas Di Pentima06/21/2021Actions
Related to Arvados - Idea #17573: User interface for exposing / changing storage classes on a collectionResolvedLucas Di Pentima07/15/2021Actions
Related to Arvados - Feature #17574: keep-balance updates API server with correct storage_classes_confirmedResolvedTom Clegg07/29/2021Actions
Related to Arvados - Feature #17696: Exported config has default storage class(es), SDKs use the configured default storage class if not overriddenResolvedLucas Di Pentima08/19/2021Actions
Related to Arvados - Idea #17697: Design for reporting tools to determine what data is on multiple storage classes.ResolvedWard VandewegeActions
Related to Arvados - Feature #17698: Parallelize writes inside keepstore when there are writes that request multiple storage classes.ResolvedTom Clegg08/13/2021Actions
Related to Arvados - Feature #17967: Prioritize reads from different storage classesResolvedTom Clegg08/05/2021Actions
Blocked by Arvados - Feature #17994: [api] storage class fields should be supported in filtersResolvedTom Clegg08/27/2021Actions
Blocked by Arvados - Feature #17995: [api] add method to get collections where replication_confirmed < replication_desiredResolvedTom Clegg08/27/2021Actions
Actions #1

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 08/01/2020 to 10/01/2020
  • Due date changed from 11/30/2020 to 12/31/2020
Actions #2

Updated by Peter Amstutz almost 5 years ago

  • Start date changed from 10/01/2020 to 11/01/2020
  • Due date changed from 12/31/2020 to 01/31/2021
Actions #3

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 11/01/2020 to 02/01/2021
  • Due date changed from 01/31/2021 to 04/30/2021
Actions #4

Updated by Peter Amstutz over 4 years ago

  • Subject changed from Storage classes to Storage classes / glacier
Actions #5

Updated by Peter Amstutz over 4 years ago

  • Start date changed from 02/01/2021 to 01/01/2021
  • Due date changed from 04/30/2021 to 03/31/2021
Actions #6

Updated by Peter Amstutz about 4 years ago

  • Due date changed from 03/31/2021 to 06/30/2021
Actions #7

Updated by Peter Amstutz about 4 years ago

  • Subject changed from Storage classes / glacier to Storage classes
Actions #8

Updated by Peter Amstutz about 4 years ago

  • Start date changed from 01/01/2021 to 03/01/2021
Actions #9

Updated by Nico César almost 4 years ago

  • Description updated (diff)
Actions #10

Updated by Nico César almost 4 years ago

  • Related to Feature #17349: [Keep API] Revisit "replicas_desired" and "storage_classes_desired". added
Actions #11

Updated by Nico César almost 4 years ago

  • Related to Feature #17350: [keep-balance] Expected behaviour with different "replicas_desired" and "storage_classes_desired" values added
Actions #12

Updated by Nico César almost 4 years ago

  • Description updated (diff)
Actions #13

Updated by Nico César almost 4 years ago

Actions #14

Updated by Nico César almost 4 years ago

Actions #15

Updated by Nico César almost 4 years ago

  • Related to Feature #17389: Storage classes support in keepproxy added
Actions #16

Updated by Nico César almost 4 years ago

  • Description updated (diff)
Actions #17

Updated by Nico César almost 4 years ago

  • Related to Bug #17390: Set storage classes for intermediates and final outputs added
Actions #18

Updated by Nico César almost 4 years ago

  • Description updated (diff)
Actions #19

Updated by Nico César almost 4 years ago

  • Description updated (diff)
Actions #20

Updated by Nico César almost 4 years ago

  • Related to Feature #17391: [keepstore] expose the volume storage classes added
Actions #21

Updated by Nico César almost 4 years ago

  • Related to Feature #17392: Support writing blocks to correct storage classes in Go SDK added
Actions #22

Updated by Nico César almost 4 years ago

  • Related to Feature #13382: [keepstore] Write new blocks to appropriate storage class added
Actions #23

Updated by Peter Amstutz almost 4 years ago

  • Related to Feature #17394: Go SDK CollectionFS writes files to correct storage class added
Actions #24

Updated by Peter Amstutz almost 4 years ago

  • Related to Feature #17395: Control storage class of container / container_request output added
Actions #25

Updated by Peter Amstutz almost 4 years ago

  • Related to Feature #17393: Go and Python SDK propagate correct storage class to keepstore added
Actions #26

Updated by Peter Amstutz almost 4 years ago

Actions #27

Updated by Peter Amstutz almost 4 years ago

  • Related to Idea #17465: Support writing blocks to correct storage classes in Python SDK added
Actions #28

Updated by Tom Clegg over 3 years ago

  • Related to Feature #11184: [Keep] Support multiple storage classes added
Actions #29

Updated by Peter Amstutz over 3 years ago

  • Related to Feature #17572: arv-mount understands storage classes added
Actions #30

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #17573: User interface for exposing / changing storage classes on a collection added
Actions #31

Updated by Peter Amstutz over 3 years ago

  • Related to Feature #17574: keep-balance updates API server with correct storage_classes_confirmed added
Actions #32

Updated by Peter Amstutz over 3 years ago

Customer request: can specify desired default storage classes, which can be more that one.

Actions #33

Updated by Peter Amstutz over 3 years ago

Reporting tools to determine what data is on multiple storage classes.

Actions #34

Updated by Peter Amstutz over 3 years ago

Way to prioritize reading data from volumes with faster storage classes.

Actions #35

Updated by Peter Amstutz over 3 years ago

Parallelize writes inside keepstore when there are writes that request multiple storage classes.

Actions #36

Updated by Peter Amstutz over 3 years ago

Has use case crunch-run starting its own keepstore server.

Actions #37

Updated by Peter Amstutz over 3 years ago

  • Related to Feature #17696: Exported config has default storage class(es), SDKs use the configured default storage class if not overridden added
Actions #38

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #17697: Design for reporting tools to determine what data is on multiple storage classes. added
Actions #39

Updated by Peter Amstutz over 3 years ago

  • Related to Feature #17698: Parallelize writes inside keepstore when there are writes that request multiple storage classes. added
Actions #40

Updated by Peter Amstutz over 3 years ago

  • Due date changed from 06/30/2021 to 07/31/2021
Actions #41

Updated by Peter Amstutz over 3 years ago

  • Due date changed from 07/31/2021 to 08/31/2021
Actions #42

Updated by Peter Amstutz over 3 years ago

  • Related to Feature #17967: Prioritize reads from different storage classes added
Actions #43

Updated by Ward Vandewege over 3 years ago

  • Blocked by Feature #17993: [deduplication-report] supports storage classes added
Actions #44

Updated by Ward Vandewege over 3 years ago

  • Blocked by Feature #17994: [api] storage class fields should be supported in filters added
Actions #45

Updated by Ward Vandewege over 3 years ago

  • Blocked by Feature #17995: [api] add method to get collections where replication_confirmed < replication_desired added
Actions #46

Updated by Peter Amstutz over 3 years ago

  • Due date changed from 08/31/2021 to 09/30/2021
Actions #47

Updated by Peter Amstutz about 3 years ago

  • Status changed from New to In Progress
Actions #48

Updated by Peter Amstutz about 3 years ago

  • Blocked by deleted (Feature #17993: [deduplication-report] supports storage classes)
Actions #49

Updated by Peter Amstutz about 3 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF