Idea #21298
openPySDK returns rich objects
Description
Today all the methods on the PySDK return raw dictionaries deserialized from JSON. This is not very ergonomic because it makes common operations very verbose. For example, getting a datetime field as a datetime object:
project = arv_client.groups().get(uuid=...).execute()
project_created = ciso8601.parse_datetime(project['created_at']
Or cross-referencing objects:
collection = arv_client.collections().get(uuid=...).execute()
project = arv_client.groups().get(uuid=collection['owner_uuid']).execute()
It would be much nicer if the original API calls returned rich objects that had methods to perform these common operations and return the result.
However, there's a bunch of dictionary-based code out there today. The path that would provide the smoothest transition from current clients to richer ones would be to enhance those dictionaries in a backwards-compatible way, rather than building a separate parallel API that clients have to be rewritten to take advantage of.
Proposed way to do that:
- Write our own subclass of
googleapiclient.model.JsonModel
. Itsdeserialize
method starts by calling the super method, then it tries to determine the kind of object it's looking at from the return value'suuid
,kind
, etc. Assuming it finds a match (should be the normal case), it promotes the return value's class to the corresponding TypedDict fromarvados.api_resources
.The model subclass could also be responsible for holding shared state (e.g., the originating API client) and sharing it with those return values as needed.
arvados.api.api_client
constructs this model and passes it togoogleapiclient.discovery.Resource
to hook everything together.
- Desired methods are added to those TypedDicts. As part of this, we will probably want to introduce a base mixin class with common methods for datetime parsing, fetching arbitrary objects by UUID, etc.
Proposed organization:
I propose building from arvados.api_resources
because the TypedDicts we already have in main today are a great foundation to build from. The data's already there, we just need to add the methods to them. We would need to extend the code generation to support that, but that's not a huge deal.
The bigger deal is: right now arvados.api_resources
only gets built at build time, and none of the rest of the code relies on it at runtime. This is partly because the tests assume you can run directly out of the source tree without building first. If we want to start relying on code generation at runtime, we should fix all of our test infrastructure etc. to build the module before testing it.
If we want to use code generation but not fix the test structure, we could do the code generation on any setuptools invocation, like we're currently doing with _version.py
. This comes with its own set of problems, and may be more difficult to support as we switch to a standard build system (#20311), but we already have those problems so it's not making things much worse, we just have two of those problems instead of one.
We could also abandon code generation, take the current arvados.api_resources
source, and start maintaining it by hand. I like this least: we just built the code generation, we already have it, why not make use of it? But it is an option.
This ticket can be split into parts that make sense to do independently:
- Any pre-work: requiring a build before tests, or else doing code generation at setup time, or else ripping out code generation and committing a snapshot of
arvados.api_resources
. - Building and integrating the model to return the TypedDicts.
- If we keep the code generation, there's probably a ticket in here to extend with support for adding our own methods to it.
- As many tickets as we want for adding as many desired methods to the return TypedDicts as we want.
No data to display