Idea #21935
closedIdentify the public API of the Python SDK and make a plan to hide non-public stuff in an internal module
Description
Proposed:
arvados.cache
has a single class, merge it intoarvados.api
. (I believe we can maintain backwards compatibility by sayingcache = api
insidearvados/__init__.py
.)- Ditto
arvados.safeapi
. - Move the following modules wholesale into
arvados._internal
, removing leading underscore as needed:arvados.diskcache
arvados.http_to_keep
arvados.retry
arvados.timer
(maybe just move this class toarvados._internal
since it's so small?)arvados._normalize_stream
arvados._pycurlhelper
arvados._ranges
- In
arvados.util
:deprecated
moves toarvados._internal
- All the
_BaseDir*
classes move toarvados._internal.basedirs
Updated by Peter Amstutz 5 months ago
- Release set to 70
- Target version changed from Future to Development 2024-07-24 sprint
Updated by Peter Amstutz 4 months ago
arvados.safeapi
is imported explicitly under certain circumstances (e.g. FUSE, arvados-cwl-runner), so it should probably continue to be import-able at the same location even if it the actual code is moved into another module.
Updated by Brett Smith 4 months ago
Peter Amstutz wrote in #note-4:
arvados.safeapi
is imported explicitly under certain circumstances (e.g. FUSE, arvados-cwl-runner), so it should probably continue to be import-able at the same location even if it the actual code is moved into another module.
Yup. The goal for this ticket is to have the module disappear from the web documentation sidebar so there's less to navigate. Plan A is to remove the whole module and define the name in __init__.py
as I mentioned. If that's not sufficient to retain backwards compatibility, plan B is to turn the module into a compatibility shim and then explicitly configure pdoc to skip that when generating documentation.
Updated by Brett Smith 4 months ago
Another note from discussion: http_to_keep
includes functionality that could be useful in public, but the current API wasn't designed with that in mind and has already had one breaking change between releases. Let's make the current code internal before the API ossifies, and we always have the option to develop a public API for it in the future.
Updated by Brett Smith 4 months ago
21935-clean-pysdk-api @ 5982a7c82355edcb3f3d3dd7c6a29cc8e2e9c107 - developer-run-tests: #4342 - failed only on #21927
- All agreed upon points are implemented / addressed.
- As part of moving
_normalize_stream
and_ranges
, I merged them together into one modulearvados._internal.streams
. - During implementation I made a judgment call to leave
arvados.retry
alone. It is probably the least useful module in the SDK, but something has to carry that title, and it at least has a designed API that's been stable for a long time that implements some Arvados-specific smarts. Given that, avoiding the API break felt better than a move. But if you feel strongly otherwise I wouldn't object to going ahead. - As part of moving
arvados.util._deprecated
, I found we still hadarv-migrate-docker19
in the repository, even though its moment is well past. I then also noticed some lingering documentation about the jobs API that was removed in #15397. Scope grew to remove all of that while I was at it.
- As part of moving
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above. Also rendered and reviewed the Python SDK documentation manually.
- Documentation has been updated.
- I did write docstrings for new modules. I did not write or modernize docstrings for every class or function that moved; I don't believe that's in scope. The main benefit of this ticket is that it improves the pydoc documentation, both on the web and the CLI, by showing users a more useful list of modules to browse with less internal noise in it.
- Behaves appropriately at the intended scale (describe intended scale).
- No change in scale
- Considered backwards and forwards compatibility issues between client and server.
- Per earlier conversation, this branch provides shim modules to keep backwards-compatible names for modules and classes that were over-specialized but definitely public-facing.
- Follows our coding standards and GUI style guidelines.
- Yes, and the branch includes small PEP 8 improvements throughout, mostly around import organization.
Updated by Peter Amstutz 4 months ago
- I think this merits an entry in the upgrading notes, even if it is downplayed as "you shouldn't have been using any of these APIs anyway"."
StreamFileReader
appears to not be used by anything. I think it was used by the legacy API methods onCollectionReader
but those appear to have been dropped, soStreamFileReader
can probably be dropped as well.ArvadosFileReaderBase
could probably safely be made internal by renaming it_ArvadosFileReaderBase
.- For that matter,
_CollectionBase
and_RichCollectionBase
are also documented as being "abstract" and "internal".
It's not clear to me what the behavior of pdoc
for methods that are inherited from classes that are hidden from the documentation. Does it include the method in the public class documentation, or does the method just end up hidden in the documentation? Perhaps there is a pdoc option to render the full API of every class, including inherited methods?
Updated by Peter Amstutz 4 months ago
- Target version changed from Development 2024-07-24 sprint to Development 2024-08-07 sprint
Updated by Brett Smith 4 months ago
Peter Amstutz wrote in #note-8:
- I think this merits an entry in the upgrading notes, even if it is downplayed as "you shouldn't have been using any of these APIs anyway"."
I've done this, but I question whether the audience who needs to see it will find it. The upgrade notes are under the Admin documentation section and most of it is written for that audience. I don't think people who "just" use client tools will be in the habit of reading this page. This feels like it's more about ticking a box than effective communication with our users.
StreamFileReader
appears to not be used by anything. I think it was used by the legacy API methods onCollectionReader
but those appear to have been dropped, soStreamFileReader
can probably be dropped as well.
Done.
ArvadosFileReaderBase
could probably safely be made internal by renaming it_ArvadosFileReaderBase
.- For that matter,
_CollectionBase
and_RichCollectionBase
are also documented as being "abstract" and "internal".
I think changing these would be a net loss because of pdoc limitations discussed below. But in a similar spirit, I did mark a bunch of internal implementation classes of arvados.keep
as such.
It's not clear to me what the behavior of
pdoc
for methods that are inherited from classes that are hidden from the documentation. Does it include the method in the public class documentation, or does the method just end up hidden in the documentation? Perhaps there is a pdoc option to render the full API of every class, including inherited methods?
pdoc generates a list of inherited methods at the bottom of the class documentation. However, if pdoc didn't generate documentation for those methods, the links won't go anywhere, so you have no way to learn about them. You can see this today in the documentation for ArvadosFile
- there's a list of methods inherited from _FileLikeObjectBase
, but no way to get details on what they are.
Fortunately those methods are simple enough it's not a huge deal. But if we made some of these base classes private, the gap would become much bigger. I think making the documentation worse this way isn't worth the upside of marking some base classes private (that have already been public for a long time already).
I don't see any options to adjust this behavior short of writing our own documentation templates, which feels out of scope for this ticket. But if you see/know something I don't, I'm all ears. For sure I'd be happier if I didn't have to choose between good documentation and good code organization.
Now at 6c81e47873d6ae90125ca2a473ebbce593fec1c9 - developer-run-tests: #4356
Updated by Peter Amstutz 4 months ago
Brett Smith wrote in #note-11:
Peter Amstutz wrote in #note-8:
- I think this merits an entry in the upgrading notes, even if it is downplayed as "you shouldn't have been using any of these APIs anyway"."
I've done this, but I question whether the audience who needs to see it will find it. The upgrade notes are under the Admin documentation section and most of it is written for that audience. I don't think people who "just" use client tools will be in the habit of reading this page. This feels like it's more about ticking a box than effective communication with our users.
You are right, for better visibility this also should go in the release notes (or the release notes link to the upgrading notes) and possibly other places. But I think it is important to mention it here since it is relevant to backwards compatibility.
[...]
I don't see any options to adjust this behavior short of writing our own documentation templates, which feels out of scope for this ticket. But if you see/know something I don't, I'm all ears. For sure I'd be happier if I didn't have to choose between good documentation and good code organization.
Thank you for researching this. I agree with the decision to keep them since the base classes declare essential methods.
Now at 6c81e47873d6ae90125ca2a473ebbce593fec1c9 - developer-run-tests: #4356
This LGTM!
Updated by Brett Smith 4 months ago
- Status changed from In Progress to Resolved