Project

General

Profile

Actions

Idea #21935

closed

Identify the public API of the Python SDK and make a plan to hide non-public stuff in an internal module

Added by Peter Amstutz 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Start date:
07/11/2024
Due date:
Story points:
-
Release:
Release relationship:
Auto

Description

Proposed:

  • arvados.cache has a single class, merge it into arvados.api. (I believe we can maintain backwards compatibility by saying cache = api inside arvados/__init__.py.)
  • Ditto arvados.safeapi.
  • Move the following modules wholesale into arvados._internal, removing leading underscore as needed:
    • arvados.diskcache
    • arvados.http_to_keep
    • arvados.retry
    • arvados.timer (maybe just move this class to arvados._internal since it's so small?)
    • arvados._normalize_stream
    • arvados._pycurlhelper
    • arvados._ranges
  • In arvados.util:
    • deprecated moves to arvados._internal
    • All the _BaseDir* classes move to arvados._internal.basedirs

Subtasks 2 (0 open2 closed)

Task #21980: Review the planResolvedPeter Amstutz07/11/2024Actions
Task #21987: Review 21935-clean-pysdk-apiResolvedPeter Amstutz07/24/2024Actions
Actions #1

Updated by Brett Smith 5 months ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz 5 months ago

  • Release set to 70
  • Target version changed from Future to Development 2024-07-24 sprint
Actions #3

Updated by Peter Amstutz 5 months ago

  • Assigned To set to Brett Smith
Actions #4

Updated by Peter Amstutz 4 months ago

arvados.safeapi is imported explicitly under certain circumstances (e.g. FUSE, arvados-cwl-runner), so it should probably continue to be import-able at the same location even if it the actual code is moved into another module.

Actions #5

Updated by Brett Smith 4 months ago

Peter Amstutz wrote in #note-4:

arvados.safeapi is imported explicitly under certain circumstances (e.g. FUSE, arvados-cwl-runner), so it should probably continue to be import-able at the same location even if it the actual code is moved into another module.

Yup. The goal for this ticket is to have the module disappear from the web documentation sidebar so there's less to navigate. Plan A is to remove the whole module and define the name in __init__.py as I mentioned. If that's not sufficient to retain backwards compatibility, plan B is to turn the module into a compatibility shim and then explicitly configure pdoc to skip that when generating documentation.

Actions #6

Updated by Brett Smith 4 months ago

Another note from discussion: http_to_keep includes functionality that could be useful in public, but the current API wasn't designed with that in mind and has already had one breaking change between releases. Let's make the current code internal before the API ossifies, and we always have the option to develop a public API for it in the future.

Actions #7

Updated by Brett Smith 4 months ago

21935-clean-pysdk-api @ 5982a7c82355edcb3f3d3dd7c6a29cc8e2e9c107 - developer-run-tests: #4342 - failed only on #21927

  • All agreed upon points are implemented / addressed.
    • As part of moving _normalize_stream and _ranges, I merged them together into one module arvados._internal.streams.
    • During implementation I made a judgment call to leave arvados.retry alone. It is probably the least useful module in the SDK, but something has to carry that title, and it at least has a designed API that's been stable for a long time that implements some Arvados-specific smarts. Given that, avoiding the API break felt better than a move. But if you feel strongly otherwise I wouldn't object to going ahead.
    • As part of moving arvados.util._deprecated, I found we still had arv-migrate-docker19 in the repository, even though its moment is well past. I then also noticed some lingering documentation about the jobs API that was removed in #15397. Scope grew to remove all of that while I was at it.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above. Also rendered and reviewed the Python SDK documentation manually.
  • Documentation has been updated.
    • I did write docstrings for new modules. I did not write or modernize docstrings for every class or function that moved; I don't believe that's in scope. The main benefit of this ticket is that it improves the pydoc documentation, both on the web and the CLI, by showing users a more useful list of modules to browse with less internal noise in it.
  • Behaves appropriately at the intended scale (describe intended scale).
    • No change in scale
  • Considered backwards and forwards compatibility issues between client and server.
    • Per earlier conversation, this branch provides shim modules to keep backwards-compatible names for modules and classes that were over-specialized but definitely public-facing.
  • Follows our coding standards and GUI style guidelines.
    • Yes, and the branch includes small PEP 8 improvements throughout, mostly around import organization.
Actions #8

Updated by Peter Amstutz 4 months ago

  • I think this merits an entry in the upgrading notes, even if it is downplayed as "you shouldn't have been using any of these APIs anyway"."
  • StreamFileReader appears to not be used by anything. I think it was used by the legacy API methods on CollectionReader but those appear to have been dropped, so StreamFileReader can probably be dropped as well.
  • ArvadosFileReaderBase could probably safely be made internal by renaming it _ArvadosFileReaderBase.
  • For that matter, _CollectionBase and _RichCollectionBase are also documented as being "abstract" and "internal".

It's not clear to me what the behavior of pdoc for methods that are inherited from classes that are hidden from the documentation. Does it include the method in the public class documentation, or does the method just end up hidden in the documentation? Perhaps there is a pdoc option to render the full API of every class, including inherited methods?

Actions #9

Updated by Peter Amstutz 4 months ago

  • Status changed from New to In Progress
Actions #10

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-07-24 sprint to Development 2024-08-07 sprint
Actions #11

Updated by Brett Smith 4 months ago

Peter Amstutz wrote in #note-8:

  • I think this merits an entry in the upgrading notes, even if it is downplayed as "you shouldn't have been using any of these APIs anyway"."

I've done this, but I question whether the audience who needs to see it will find it. The upgrade notes are under the Admin documentation section and most of it is written for that audience. I don't think people who "just" use client tools will be in the habit of reading this page. This feels like it's more about ticking a box than effective communication with our users.

  • StreamFileReader appears to not be used by anything. I think it was used by the legacy API methods on CollectionReader but those appear to have been dropped, so StreamFileReader can probably be dropped as well.

Done.

  • ArvadosFileReaderBase could probably safely be made internal by renaming it _ArvadosFileReaderBase.
  • For that matter, _CollectionBase and _RichCollectionBase are also documented as being "abstract" and "internal".

I think changing these would be a net loss because of pdoc limitations discussed below. But in a similar spirit, I did mark a bunch of internal implementation classes of arvados.keep as such.

It's not clear to me what the behavior of pdoc for methods that are inherited from classes that are hidden from the documentation. Does it include the method in the public class documentation, or does the method just end up hidden in the documentation? Perhaps there is a pdoc option to render the full API of every class, including inherited methods?

pdoc generates a list of inherited methods at the bottom of the class documentation. However, if pdoc didn't generate documentation for those methods, the links won't go anywhere, so you have no way to learn about them. You can see this today in the documentation for ArvadosFile - there's a list of methods inherited from _FileLikeObjectBase, but no way to get details on what they are.

Fortunately those methods are simple enough it's not a huge deal. But if we made some of these base classes private, the gap would become much bigger. I think making the documentation worse this way isn't worth the upside of marking some base classes private (that have already been public for a long time already).

I don't see any options to adjust this behavior short of writing our own documentation templates, which feels out of scope for this ticket. But if you see/know something I don't, I'm all ears. For sure I'd be happier if I didn't have to choose between good documentation and good code organization.

Now at 6c81e47873d6ae90125ca2a473ebbce593fec1c9 - developer-run-tests: #4356

Actions #12

Updated by Peter Amstutz 4 months ago

Brett Smith wrote in #note-11:

Peter Amstutz wrote in #note-8:

  • I think this merits an entry in the upgrading notes, even if it is downplayed as "you shouldn't have been using any of these APIs anyway"."

I've done this, but I question whether the audience who needs to see it will find it. The upgrade notes are under the Admin documentation section and most of it is written for that audience. I don't think people who "just" use client tools will be in the habit of reading this page. This feels like it's more about ticking a box than effective communication with our users.

You are right, for better visibility this also should go in the release notes (or the release notes link to the upgrading notes) and possibly other places. But I think it is important to mention it here since it is relevant to backwards compatibility.

[...]

I don't see any options to adjust this behavior short of writing our own documentation templates, which feels out of scope for this ticket. But if you see/know something I don't, I'm all ears. For sure I'd be happier if I didn't have to choose between good documentation and good code organization.

Thank you for researching this. I agree with the decision to keep them since the base classes declare essential methods.

Now at 6c81e47873d6ae90125ca2a473ebbce593fec1c9 - developer-run-tests: #4356

This LGTM!

Actions #13

Updated by Brett Smith 4 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF