Project

General

Profile

Actions

Idea #19686

closed

arvados.api() should return ThreadSafeApiCache

Added by Peter Amstutz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
11/29/2022
Due date:
Story points:
-
Release relationship:
Auto

Description

The python SDK method arvados.api() should return a ThreadSafeApiCache object instead of the plain API object. Benefits:

  • It is threadsafe
  • It creates a corresponding keep client object, which can be accessed by classes like Collection() that take an api client without having to manage passing an extra keep client object around.
  • ThreadSafeApiCache uses python thread-local storage, we should be aware of how items in thread-local storage get cleaned up when the thread terminates.

This is in response to users running into this issue.


Subtasks 1 (0 open1 closed)

Task #19802: Review 19686-threadsafe-api-defaultResolvedBrett Smith11/29/2022Actions
Actions #1

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz about 2 years ago

  • Subject changed from arvados.api() returns ThreadSafeApiCache to arvados.api() should return ThreadSafeApiCache
Actions #3

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Actions #4

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2022-12-07 Sprint to 2022-11-23 sprint
Actions #5

Updated by Peter Amstutz about 2 years ago

  • Assigned To deleted (Peter Amstutz)
Actions #6

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Actions #7

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #8

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #9

Updated by Peter Amstutz about 2 years ago

  • Description updated (diff)
Actions #10

Updated by Peter Amstutz about 2 years ago

  • Assigned To set to Brett Smith
Actions #11

Updated by Brett Smith about 2 years ago

19686-threadsafe-api-default @ c953b61d6

New API

At standup, we discussed that we wanted to let users still have a way to build just the Resource object. I have done that by adding a new api_client function that does just that, it's the actual object-building part of the old api function. I took this approach for two reasons:

  1. IMO, having a flag like threadsafe=False that substantially changes what the method does (in this case, changing its return type) is a bad code smell.
  2. This gave me an opportunity to build a clean API for api_client. It just builds objects, and takes the parameters required for that process as arguments. It doesn't load configuration, or build the discovery URL, or any of that. Building the inputs is a job for other functions.

From here I broke out two new functions: normalize_api_kwargs is the argument-parsing part of the old api, and api_kwargs_from_config is the argument-building part of the old api_from_config. IMO these are pretty safe to make public: all they do is build dictionaries from dictionaries, and the names are pretty precise on that point and not taking up valuable short names. That said, I don't feel super strongly about keeping them public. If we'd rather they be flagged private, I could go along with that.

With these separated, it's easier to compose api, api_from_config, and ThreadSafeApiCache. That's all the first commit does: introduce the new APIs, and use them to implement the old APIs. It's admittedly a huge diff, but IMO the end result is easier to follow and nicer to use.

API compatibility notes

The second commit actually makes the requested change. I had to add some dedicated handling of the request_id attribute to ThreadSafeApiCache to keep the tests passing.

My docstrings say ThreadSafeApiCache is API-compatible, but that's only mostly true. When we build the Resource object we attach a bunch of attributes on it. Any code that writes these attributes may see different behavior when they start getting a ThreadSafeApiCache instead. It's not clear whether that's intended to be a supported part of the API or not. For now I decided to just use tested behavior as my threshold for what we need to support.

isinstance(arv_client, Resource) used to return True and will now return False now too. We could keep it returning True if we wanted, by overriding __instancecheck__. IMO this is kind of a trade-off: doing so would maintain stricter API compatibility, but the result could be really confusing in some contexts. For now I erred on the side of less code, but I'm also open to discussing this.

Actions #12

Updated by Brett Smith about 2 years ago

  • Status changed from New to In Progress
Actions #13

Updated by Tom Clegg about 2 years ago

This seems a bit silly to mention but is it necessary/advantageous to have the module docstring before the license comment? PEP says first statement in a module can be a module docstring and I don't think a comment is a statement, but maybe some tools are more picky? License at the top is a little easier for "check for license" tooling and seems more logical in that the license applies to the docstring but not vice versa.

TIL "*" can be used like this to prevent callers from accidentally creating future incompatibilities by passing cache etc. as positional parameters.

def api_client(
        version,
        discoveryServiceUrl,
        token,
        *,
        cache=True,
        http=None,

Everything LGTM, thanks!

Actions #14

Updated by Brett Smith about 2 years ago

Tom Clegg wrote in #note-13:

This seems a bit silly to mention but is it necessary/advantageous to have the module docstring before the license comment?

No need, I just thought it was nicer for readers and didn't know any reason not to. No problem moving it to keep our tooling life easier. Done @ cc4492ecf.

TIL "*" can be used like this to prevent callers from accidentally creating future incompatibilities by passing cache etc. as positional parameters.

Yeah, this is new in Python 3 so we couldn't start using it until we dropped Python 2 support. But now that we have I think it's a great tool to help keep our API boundaries cleaner. Thanks.

Actions #15

Updated by Tom Clegg about 2 years ago

LGTM, thanks

Actions #16

Updated by Brett Smith about 2 years ago

Last night, for reasons I can't explain, it suddenly dawned on me that the restructuring I did to compose the existing API client constructors out of normalize_api_kwargs and api_kwargs_from_config meant that some functions that were previously being tested implicitly by other tests were no longer being tested. So, I wrote a bunch of basic tests to address that. Sure enough, in the process, I found one backwards compatibility break: if old code was calling ThreadSafeApiCache with version='v1' in api_params (to stop the SDK from complaining about unspecified version), this code would now fail because the new version would call a kwargs function with version specified twice, which is an error.

I've pushed two additional commits to the branch. The first adds a bunch of new tests for all the public-facing functions that didn't require any code changes. The second addresses that backwards compatibility break. Please take a look. Thanks.

Actions #17

Updated by Brett Smith about 2 years ago

Brett Smith wrote in #note-16:

I found one backwards compatibility break: if old code was calling ThreadSafeApiCache with version='v1' in api_params (to stop the SDK from complaining about unspecified version), this code would now fail because the new version would call a kwargs function with version specified twice, which is an error.

Ugh, sorry, no, this is wrong, I'm having DUELING SHOWER THOUGHTS.

The pre-branch code avoided the warning by just hard-coding the 'v1' version argument to the constructor. If you tried to specify the version in api_params, it would raise the same error for specifying the version argument twice. So there's no need to support this case.

We could support specifying version in api_params this way, it's not difficult and clear enough, but IMO it's not worth it. "There should be one, and preferably only one, obvious way to do it," and in the branch that should be passing the explicit version argument to ThreadSafeApiCache. This support is minimal extra complexity for literally zero value. So I think we should merge without this final commit.

I would still like to see the tests reviewed and merged, because well, I put in the effort and IMHO they're nice tests. But I could spin that off to a separate post-merge branch or something if we like.

Actions #18

Updated by Peter Amstutz about 2 years ago

  • Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
Actions #19

Updated by Tom Clegg about 2 years ago

Makes sense. version-in-api_params update and tests LGTM.

Now that ThreadSafeApiCache() without version= logs "INFO: Using default API version. Call arvados.api('v1') instead" instead of silently forcing v1, we should take that advice and update usage in sdk/cwl and services/fuse to specify v1.

Actions #20

Updated by Brett Smith about 2 years ago

Tom Clegg wrote in #note-19:

Now that ThreadSafeApiCache() without version= logs "INFO: Using default API version. Call arvados.api('v1') instead" instead of silently forcing v1, we should take that advice and update usage in sdk/cwl and services/fuse to specify v1.

Good catch, thank you. Done in 9c872c735. Note also a small test fix in ee357f36e.

Actions #21

Updated by Tom Clegg about 2 years ago

LGTM, thanks

Actions #22

Updated by Brett Smith about 2 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #23

Updated by Peter Amstutz about 2 years ago

  • Release set to 47
Actions

Also available in: Atom PDF