Idea #19686
closedarvados.api() should return ThreadSafeApiCache
Added by Peter Amstutz about 2 years ago. Updated about 2 years ago.
Description
The python SDK method arvados.api() should return a ThreadSafeApiCache object instead of the plain API object. Benefits:
- It is threadsafe
- It creates a corresponding keep client object, which can be accessed by classes like Collection() that take an api client without having to manage passing an extra keep client object around.
- ThreadSafeApiCache uses python thread-local storage, we should be aware of how items in thread-local storage get cleaned up when the thread terminates.
This is in response to users running into this issue.
Updated by Peter Amstutz about 2 years ago
- Subject changed from arvados.api() returns ThreadSafeApiCache to arvados.api() should return ThreadSafeApiCache
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-12-07 Sprint to 2022-11-23 sprint
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Updated by Brett Smith about 2 years ago
19686-threadsafe-api-default @ c953b61d6
New API¶
At standup, we discussed that we wanted to let users still have a way to build just the Resource
object. I have done that by adding a new api_client
function that does just that, it's the actual object-building part of the old api
function. I took this approach for two reasons:
- IMO, having a flag like
threadsafe=False
that substantially changes what the method does (in this case, changing its return type) is a bad code smell. - This gave me an opportunity to build a clean API for
api_client
. It just builds objects, and takes the parameters required for that process as arguments. It doesn't load configuration, or build the discovery URL, or any of that. Building the inputs is a job for other functions.
From here I broke out two new functions: normalize_api_kwargs
is the argument-parsing part of the old api
, and api_kwargs_from_config
is the argument-building part of the old api_from_config
. IMO these are pretty safe to make public: all they do is build dictionaries from dictionaries, and the names are pretty precise on that point and not taking up valuable short names. That said, I don't feel super strongly about keeping them public. If we'd rather they be flagged private, I could go along with that.
With these separated, it's easier to compose api
, api_from_config
, and ThreadSafeApiCache
. That's all the first commit does: introduce the new APIs, and use them to implement the old APIs. It's admittedly a huge diff, but IMO the end result is easier to follow and nicer to use.
API compatibility notes¶
The second commit actually makes the requested change. I had to add some dedicated handling of the request_id
attribute to ThreadSafeApiCache
to keep the tests passing.
My docstrings say ThreadSafeApiCache
is API-compatible, but that's only mostly true. When we build the Resource
object we attach a bunch of attributes on it. Any code that writes these attributes may see different behavior when they start getting a ThreadSafeApiCache
instead. It's not clear whether that's intended to be a supported part of the API or not. For now I decided to just use tested behavior as my threshold for what we need to support.
isinstance(arv_client, Resource)
used to return True and will now return False now too. We could keep it returning True
if we wanted, by overriding __instancecheck__
. IMO this is kind of a trade-off: doing so would maintain stricter API compatibility, but the result could be really confusing in some contexts. For now I erred on the side of less code, but I'm also open to discussing this.
Updated by Brett Smith about 2 years ago
- Status changed from New to In Progress
Updated by Tom Clegg about 2 years ago
This seems a bit silly to mention but is it necessary/advantageous to have the module docstring before the license comment? PEP says first statement in a module can be a module docstring and I don't think a comment is a statement, but maybe some tools are more picky? License at the top is a little easier for "check for license" tooling and seems more logical in that the license applies to the docstring but not vice versa.
TIL "*" can be used like this to prevent callers from accidentally creating future incompatibilities by passing cache
etc. as positional parameters.
def api_client( version, discoveryServiceUrl, token, *, cache=True, http=None,
Everything LGTM, thanks!
Updated by Brett Smith about 2 years ago
Tom Clegg wrote in #note-13:
This seems a bit silly to mention but is it necessary/advantageous to have the module docstring before the license comment?
No need, I just thought it was nicer for readers and didn't know any reason not to. No problem moving it to keep our tooling life easier. Done @ cc4492ecf.
TIL "*" can be used like this to prevent callers from accidentally creating future incompatibilities by passing
cache
etc. as positional parameters.
Yeah, this is new in Python 3 so we couldn't start using it until we dropped Python 2 support. But now that we have I think it's a great tool to help keep our API boundaries cleaner. Thanks.
Updated by Brett Smith about 2 years ago
Last night, for reasons I can't explain, it suddenly dawned on me that the restructuring I did to compose the existing API client constructors out of normalize_api_kwargs
and api_kwargs_from_config
meant that some functions that were previously being tested implicitly by other tests were no longer being tested. So, I wrote a bunch of basic tests to address that. Sure enough, in the process, I found one backwards compatibility break: if old code was calling ThreadSafeApiCache
with version='v1'
in api_params
(to stop the SDK from complaining about unspecified version), this code would now fail because the new version would call a kwargs function with version
specified twice, which is an error.
I've pushed two additional commits to the branch. The first adds a bunch of new tests for all the public-facing functions that didn't require any code changes. The second addresses that backwards compatibility break. Please take a look. Thanks.
Updated by Brett Smith about 2 years ago
Brett Smith wrote in #note-16:
I found one backwards compatibility break: if old code was calling
ThreadSafeApiCache
withversion='v1'
inapi_params
(to stop the SDK from complaining about unspecified version), this code would now fail because the new version would call a kwargs function withversion
specified twice, which is an error.
Ugh, sorry, no, this is wrong, I'm having DUELING SHOWER THOUGHTS.
The pre-branch code avoided the warning by just hard-coding the 'v1'
version argument to the constructor. If you tried to specify the version in api_params
, it would raise the same error for specifying the version argument twice. So there's no need to support this case.
We could support specifying version
in api_params
this way, it's not difficult and clear enough, but IMO it's not worth it. "There should be one, and preferably only one, obvious way to do it," and in the branch that should be passing the explicit version
argument to ThreadSafeApiCache
. This support is minimal extra complexity for literally zero value. So I think we should merge without this final commit.
I would still like to see the tests reviewed and merged, because well, I put in the effort and IMHO they're nice tests. But I could spin that off to a separate post-merge branch or something if we like.
Updated by Peter Amstutz about 2 years ago
- Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
Updated by Tom Clegg about 2 years ago
Makes sense. version-in-api_params update and tests LGTM.
Now that ThreadSafeApiCache()
without version=
logs "INFO: Using default API version. Call arvados.api('v1') instead"
instead of silently forcing v1, we should take that advice and update usage in sdk/cwl and services/fuse to specify v1.
Updated by Brett Smith about 2 years ago
Tom Clegg wrote in #note-19:
Now that
ThreadSafeApiCache()
withoutversion=
logs"INFO: Using default API version. Call arvados.api('v1') instead"
instead of silently forcing v1, we should take that advice and update usage in sdk/cwl and services/fuse to specify v1.
Good catch, thank you. Done in 9c872c735. Note also a small test fix in ee357f36e.
Updated by Brett Smith about 2 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados|8171328873751d5bfd47cd9da3f6ff9a66c84659.