Feature #17696
closedExported config has default storage class(es), SDKs use the configured default storage class if not overridden
Description
Use case: the default storage classes should be configurable in the config file, and there can be more than one.
- api/controller reads Collections/DefaultStorageClasses map[string]bool. The default value should be {"default":true} (aka one storage class named "default").
- out of the box, if Collections/DefaultStorageClasses is empty, and Volumes/XXXX/StorageClasses is empty, the implied value for both of those is "default". This should be done as part of config load. Using config-dump will show these "default" values. The config on the website will just have this listed in the comments, not in the sample config file.
- when new collections are stored (and no storage classes are requested) the default values should be added to the desired storage classes
- update those clients that send "default" as the storage class to send nothing instead
- need a migration? for existing collections with storage class "default". Add a python recipe to the cookbook to change a particular storage class (e.g. "default") to something else.
- add some tests
- validation on the config-check:
- Make sure the default class(es) are implemented by a volume.
- If the config mentions an explicit storage class on a volume, all volumes must have storage classes defined (warning or error?) and Collections/DefaultStorageClasses must be set explicitly.
This gives us a short configuration (nothing set) that implies "default" in the simple case, but as soon as someone adds an additional storage class, we protect them from mistakes by forcing them to be explicit about all the storage classes in use.
Related issues
Updated by Peter Amstutz over 3 years ago
- Related to Idea #16107: Storage classes added
Updated by Peter Amstutz over 3 years ago
- Target version set to 2021-08-18 sprint
Updated by Ward Vandewege over 3 years ago
- Story points set to 3.0
- Description updated (diff)
Updated by Lucas Di Pentima over 3 years ago
- Assigned To set to Lucas Di Pentima
Updated by Lucas Di Pentima over 3 years ago
- Blocked by Task #17978: Add storage class default/priority fields to config + exported config added
Updated by Peter Amstutz over 3 years ago
- Blocked by Feature #17967: Prioritize reads from different storage classes added
Updated by Tom Clegg over 3 years ago
With #17967, the exported config has defaults, but keepstore and apiserver both use the default storage classes if the caller doesn't specify anything -- so the SDKs probably don't even need to look up the defaults.
Updated by Lucas Di Pentima over 3 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 3 years ago
Tom Clegg wrote:
With #17967, the exported config has defaults, but keepstore and apiserver both use the default storage classes if the caller doesn't specify anything -- so the SDKs probably don't even need to look up the defaults.
Let's think this through.
- The client picks a random server (rendezvous hashed) to write to
- The server sees there are no storage classes specified, so it assumes the client wants the default ones
- The server doesn't offer any of the default storage classes
- The servers sends back an error (what error?)
- The client tries the next server in the list
- The next server offers one default storage class but not the other
- It sends back success with replication=2 (because that's what the storage class's replication is)
- The client declares success (but didn't write to both storage classes)
So I think it would be better than nothing for backwards compatibility, but in some situations it would not do the right thing.
Updated by Tom Clegg over 3 years ago
Peter Amstutz wrote:
- The client declares success (but didn't write to both storage classes)
Yeah, good call. The client does need to track storage classes after all.
Updated by Lucas Di Pentima over 3 years ago
- Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Updated by Lucas Di Pentima over 3 years ago
Updates at 5ac636b5a - branch 17696-pysdk-default-storage-class
Test run: developer-run-tests: #2644
- Updates Python SDK to request the configured storage classes when none are explicitly requested.
- Updates the documentation on the collection's API section.
- Adds & updates tests.
Updated by Lucas Di Pentima over 3 years ago
Updated by Lucas Di Pentima over 3 years ago
Update on PySDK branch¶
Updates at 742be53
Test run: developer-run-tests: #2645
- Dropped the commit about changing the
StorageClasses.SAMPLE.Default
value tofalse
because it made some tests fail.
Updated by Lucas Di Pentima about 3 years ago
Updates at 3b492e126 - branch 17696-gosdk-default-storage-class
Test run: developer-run-tests: #2647
- Adds
ClusterConfig()
function toarvadosclient
. - Loads configured default storage classes when creating a new
KeepClient
. - Adds tests.
Updated by Peter Amstutz about 3 years ago
Lucas Di Pentima wrote:
Updates at 5ac636b5a - branch
17696-pysdk-default-storage-class
Test run: developer-run-tests: #2644
- Updates Python SDK to request the configured storage classes when none are explicitly requested.
- Updates the documentation on the collection's API section.
- Adds & updates tests.
I think we also need the Collection class to take its storage classes from the default. In the current code, a new Collection record will write blocks to the default storage class from the config, but the record will be written to the API with empty storage classes, and we assume the API server will update it to the same default storage classes that the blocks were written to. If, for some reason, the API server behaved differently in setting the default, all the blocks would all have been written to the wrong place.
If the Collection object gets its default storage classes from the config (to be used when not explicitly specified) then both the keep client and Collection object will use the same storage classes.
Updated by Peter Amstutz about 3 years ago
Lucas Di Pentima wrote:
Updates at 3b492e126 - branch
17696-gosdk-default-storage-class
Test run: developer-run-tests: #2647
- Adds
ClusterConfig()
function toarvadosclient
.- Loads configured default storage classes when creating a new
KeepClient
.- Adds tests.
It hurts a bit that we're adding new features to arvadosclient instead of consolidating it with arvados.Client, maybe someday we'll find time to fix that.
Same comment goes for the Go SDK as for the Python SDK, the arvados.Collection.FileSystem() method should set c.StorageClassesDesired to the config default instead of assuming that keepclient and API server will agree.
Updated by Lucas Di Pentima about 3 years ago
Peter Amstutz wrote:
I think we also need the Collection class to take its storage classes from the default. In the current code, a new Collection record will write blocks to the default storage class from the config, but the record will be written to the API with empty storage classes, and we assume the API server will update it to the same default storage classes that the blocks were written to. If, for some reason, the API server behaved differently in setting the default, all the blocks would all have been written to the wrong place.
If the Collection object gets its default storage classes from the config (to be used when not explicitly specified) then both the keep client and Collection object will use the same storage classes.
Updates at 99e0417c6 (branch 17696-pysdk-default-storage-class
)
Test run: developer-run-tests: #2654
- Adds default storage classes usage on
Collection
class. - Adds test.
Updated by Lucas Di Pentima about 3 years ago
Updated by Lucas Di Pentima about 3 years ago
After discussion with the team, we decided that the default storage classes handling will be left to just the keep clients so I've dropped the latest commits and rebase to main
.
PySDK side at 5209c88 - branch 17696-pysdk-default-storage-class
Test run: developer-run-tests: #2655
Updated by Lucas Di Pentima about 3 years ago
Updates at 913c7638c (GoSDK branch)
Test run: developer-run-tests: #2659
- Adds integration test that proves that a Collection is saved with the same
storage_classes_desired
list as the KeepClient uses. - In doing the above, I realized that the
kc.loadDefaultSorageClasses()
call should be done onkeepclient.New()
. - Updates previously added test.
Updated by Peter Amstutz about 3 years ago
Lucas Di Pentima wrote:
Updates at 913c7638c (GoSDK branch)
Test run: developer-run-tests: #2659
- Adds integration test that proves that a Collection is saved with the same
storage_classes_desired
list as the KeepClient uses.- In doing the above, I realized that the
kc.loadDefaultSorageClasses()
call should be done onkeepclient.New()
.- Updates previously added test.
Is KeepClient.loadDefaultClasses() backwards compatible? I see it prints a warning in that case, and DefaultStorageClasses will be nil. Is that handled properly? Glancing at BlockWrite, I'm not confident that it is. Does it need to make sure to set [default] ?
Updated by Peter Amstutz about 3 years ago
Lucas Di Pentima wrote:
After discussion with the team, we decided that the default storage classes handling will be left to just the keep clients so I've dropped the latest commits and rebase to
main
.PySDK side at 5209c88 - branch
17696-pysdk-default-storage-class
Test run: developer-run-tests: #2655
LGTM
Updated by Lucas Di Pentima about 3 years ago
Peter Amstutz wrote:
Is KeepClient.loadDefaultClasses() backwards compatible? I see it prints a warning in that case, and DefaultStorageClasses will be nil. Is that handled properly? Glancing at BlockWrite, I'm not confident that it is. Does it need to make sure to set [default] ?
Updates at 6610ed6 (rebased)
Test run: developer-run-tests: #2666
Test re-run: developer-run-tests-remainder: #2772
- Updated (& fixed) tests that simulate the case when talking to an old cluster that doesn't have storage classes keep support & exported config defaults.
Updated by Lucas Di Pentima about 3 years ago
- Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Updated by Peter Amstutz about 3 years ago
Lucas Di Pentima wrote:
Peter Amstutz wrote:
Is KeepClient.loadDefaultClasses() backwards compatible? I see it prints a warning in that case, and DefaultStorageClasses will be nil. Is that handled properly? Glancing at BlockWrite, I'm not confident that it is. Does it need to make sure to set [default] ?
Updates at 6610ed6 (rebased)
Test run: developer-run-tests: #2666
Test re-run: developer-run-tests-remainder: #2772
- Updated (& fixed) tests that simulate the case when talking to an old cluster that doesn't have storage classes keep support & exported config defaults.
LGTM
Updated by Lucas Di Pentima about 3 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|dc1c6914b00b900978c5ee044e39f09dbec62b33.