Feature #17696
closed
Exported config has default storage class(es), SDKs use the configured default storage class if not overridden
Added by Peter Amstutz over 3 years ago.
Updated about 3 years ago.
Release relationship:
Auto
Description
Use case: the default storage classes should be configurable in the config file, and there can be more than one.
- api/controller reads Collections/DefaultStorageClasses map[string]bool. The default value should be {"default":true} (aka one storage class named "default").
- out of the box, if Collections/DefaultStorageClasses is empty, and Volumes/XXXX/StorageClasses is empty, the implied value for both of those is "default". This should be done as part of config load. Using config-dump will show these "default" values. The config on the website will just have this listed in the comments, not in the sample config file.
- when new collections are stored (and no storage classes are requested) the default values should be added to the desired storage classes
- update those clients that send "default" as the storage class to send nothing instead
- need a migration? for existing collections with storage class "default". Add a python recipe to the cookbook to change a particular storage class (e.g. "default") to something else.
- add some tests
- validation on the config-check:
- Make sure the default class(es) are implemented by a volume.
- If the config mentions an explicit storage class on a volume, all volumes must have storage classes defined (warning or error?) and Collections/DefaultStorageClasses must be set explicitly.
This gives us a short configuration (nothing set) that implies "default" in the simple case, but as soon as someone adds an additional storage class, we protect them from mistakes by forcing them to be explicit about all the storage classes in use.
- Target version set to 2021-08-18 sprint
- Description updated (diff)
- Story points set to 3.0
- Description updated (diff)
- Tracker changed from Idea to Feature
- Assigned To set to Lucas Di Pentima
- Blocked by Task #17978: Add storage class default/priority fields to config + exported config added
- Blocked by Feature #17967: Prioritize reads from different storage classes added
With #17967, the exported config has defaults, but keepstore and apiserver both use the default storage classes if the caller doesn't specify anything -- so the SDKs probably don't even need to look up the defaults.
- Status changed from New to In Progress
Tom Clegg wrote:
With #17967, the exported config has defaults, but keepstore and apiserver both use the default storage classes if the caller doesn't specify anything -- so the SDKs probably don't even need to look up the defaults.
Let's think this through.
- The client picks a random server (rendezvous hashed) to write to
- The server sees there are no storage classes specified, so it assumes the client wants the default ones
- The server doesn't offer any of the default storage classes
- The servers sends back an error (what error?)
- The client tries the next server in the list
- The next server offers one default storage class but not the other
- It sends back success with replication=2 (because that's what the storage class's replication is)
- The client declares success (but didn't write to both storage classes)
So I think it would be better than nothing for backwards compatibility, but in some situations it would not do the right thing.
Peter Amstutz wrote:
- The client declares success (but didn't write to both storage classes)
Yeah, good call. The client does need to track storage classes after all.
- Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Updates at 5ac636b5a - branch 17696-pysdk-default-storage-class
Test run: developer-run-tests: #2644
- Updates Python SDK to request the configured storage classes when none are explicitly requested.
- Updates the documentation on the collection's API section.
- Adds & updates tests.
Pending¶
- Migration cookbook recipe: Depends on #17994 -- Maybe this task should be on its own story blocked by it.
- Go SDK (in progress)
Update on PySDK branch¶
Updates at 742be53
Test run: developer-run-tests: #2645
- Dropped the commit about changing the
StorageClasses.SAMPLE.Default
value to false
because it made some tests fail.
Updates at 3b492e126 - branch 17696-gosdk-default-storage-class
Test run: developer-run-tests: #2647
- Adds
ClusterConfig()
function to arvadosclient
.
- Loads configured default storage classes when creating a new
KeepClient
.
- Adds tests.
Lucas Di Pentima wrote:
Updates at 5ac636b5a - branch 17696-pysdk-default-storage-class
Test run: developer-run-tests: #2644
- Updates Python SDK to request the configured storage classes when none are explicitly requested.
- Updates the documentation on the collection's API section.
- Adds & updates tests.
I think we also need the Collection class to take its storage classes from the default. In the current code, a new Collection record will write blocks to the default storage class from the config, but the record will be written to the API with empty storage classes, and we assume the API server will update it to the same default storage classes that the blocks were written to. If, for some reason, the API server behaved differently in setting the default, all the blocks would all have been written to the wrong place.
If the Collection object gets its default storage classes from the config (to be used when not explicitly specified) then both the keep client and Collection object will use the same storage classes.
Lucas Di Pentima wrote:
Updates at 3b492e126 - branch 17696-gosdk-default-storage-class
Test run: developer-run-tests: #2647
- Adds
ClusterConfig()
function to arvadosclient
.
- Loads configured default storage classes when creating a new
KeepClient
.
- Adds tests.
It hurts a bit that we're adding new features to arvadosclient instead of consolidating it with arvados.Client, maybe someday we'll find time to fix that.
Same comment goes for the Go SDK as for the Python SDK, the arvados.Collection.FileSystem() method should set c.StorageClassesDesired to the config default instead of assuming that keepclient and API server will agree.
Peter Amstutz wrote:
I think we also need the Collection class to take its storage classes from the default. In the current code, a new Collection record will write blocks to the default storage class from the config, but the record will be written to the API with empty storage classes, and we assume the API server will update it to the same default storage classes that the blocks were written to. If, for some reason, the API server behaved differently in setting the default, all the blocks would all have been written to the wrong place.
If the Collection object gets its default storage classes from the config (to be used when not explicitly specified) then both the keep client and Collection object will use the same storage classes.
Updates at 99e0417c6 (branch 17696-pysdk-default-storage-class
)
Test run: developer-run-tests: #2654
- Adds default storage classes usage on
Collection
class.
- Adds test.
After discussion with the team, we decided that the default storage classes handling will be left to just the keep clients so I've dropped the latest commits and rebase to main
.
PySDK side at 5209c88 - branch 17696-pysdk-default-storage-class
Test run: developer-run-tests: #2655
Updates at 913c7638c (GoSDK branch)
Test run: developer-run-tests: #2659
- Adds integration test that proves that a Collection is saved with the same
storage_classes_desired
list as the KeepClient uses.
- In doing the above, I realized that the
kc.loadDefaultSorageClasses()
call should be done on keepclient.New()
.
- Updates previously added test.
Lucas Di Pentima wrote:
Updates at 913c7638c (GoSDK branch)
Test run: developer-run-tests: #2659
- Adds integration test that proves that a Collection is saved with the same
storage_classes_desired
list as the KeepClient uses.
- In doing the above, I realized that the
kc.loadDefaultSorageClasses()
call should be done on keepclient.New()
.
- Updates previously added test.
Is KeepClient.loadDefaultClasses() backwards compatible? I see it prints a warning in that case, and DefaultStorageClasses will be nil. Is that handled properly? Glancing at BlockWrite, I'm not confident that it is. Does it need to make sure to set [default] ?
Lucas Di Pentima wrote:
After discussion with the team, we decided that the default storage classes handling will be left to just the keep clients so I've dropped the latest commits and rebase to main
.
PySDK side at 5209c88 - branch 17696-pysdk-default-storage-class
Test run: developer-run-tests: #2655
LGTM
Peter Amstutz wrote:
Is KeepClient.loadDefaultClasses() backwards compatible? I see it prints a warning in that case, and DefaultStorageClasses will be nil. Is that handled properly? Glancing at BlockWrite, I'm not confident that it is. Does it need to make sure to set [default] ?
Updates at 6610ed6 (rebased)
Test run: developer-run-tests: #2666
Test re-run: developer-run-tests-remainder: #2772
- Updated (& fixed) tests that simulate the case when talking to an old cluster that doesn't have storage classes keep support & exported config defaults.
- Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Lucas Di Pentima wrote:
Peter Amstutz wrote:
Is KeepClient.loadDefaultClasses() backwards compatible? I see it prints a warning in that case, and DefaultStorageClasses will be nil. Is that handled properly? Glancing at BlockWrite, I'm not confident that it is. Does it need to make sure to set [default] ?
Updates at 6610ed6 (rebased)
Test run: developer-run-tests: #2666
Test re-run: developer-run-tests-remainder: #2772
- Updated (& fixed) tests that simulate the case when talking to an old cluster that doesn't have storage classes keep support & exported config defaults.
LGTM
- Status changed from In Progress to Resolved
Also available in: Atom
PDF