Project

General

Profile

Actions

Feature #19929

closed

Improve documentation in the discovery document

Added by Brett Smith almost 2 years ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
1.0
Release:
Release relationship:
Auto

Description

Just like every public class and method in the Python SDK should have its own docstring, every public schema, resource type, and method in the API server's discovery document should have a consistent description. SDKs in every language can use this to generate documentation.

IMO unless some other requirement overrides, description fields should follow the same Markdown formatting rules we use for Python docstrings, for the same reasons: to provide a presentation that looks good in both plaintext and HTML.

  • Make sure all parameters of all methods are advertised. Known missing:
    • The select parameter of get methods
  • Make sure all parameters of all methods have a description. Known missing:
    • list parameters count, distinct, filters, limit, offset, order, where
  • Make sure all method parameters with default values advertise that default correctly—with a possible exception for parameters that must be cluster-specific like cluster_id. Known missing:
    • The select attribute of most methods
    • list parameters bypass_federation, filters, order, where.

Don't bother with anything deprecated and/or do this after the "declutter the API story".


Subtasks 1 (0 open1 closed)

Task #20444: Review 19929-fill-discovery-documentResolvedBrett Smith09/10/2024Actions

Related issues

Related to Arvados - Support #18799: Strategy to generate Python SDK docstrings based on API docsResolvedBrett Smith04/14/2023Actions
Related to Arvados - Bug #3817: [API] Discovery document schema does not include generated fields like job dependenciesNewActions
Related to Arvados - Feature #15397: Declutter the APIResolvedTom CleggActions
Related to Arvados - Feature #21909: Refresh R SDKs based on 3.0 API changesResolvedBrett SmithActions
Actions #1

Updated by Brett Smith almost 2 years ago

  • Related to Support #18799: Strategy to generate Python SDK docstrings based on API docs added
Actions #2

Updated by Brett Smith almost 2 years ago

  • Story points set to 1.0

Estimating one story point based on:

$ git grep -c description: services/api/app/controllers/
services/api/app/controllers/application_controller.rb:8
services/api/app/controllers/arvados/v1/collections_controller.rb:4
services/api/app/controllers/arvados/v1/container_requests_controller.rb:2
services/api/app/controllers/arvados/v1/groups_controller.rb:7
services/api/app/controllers/arvados/v1/nodes_controller.rb:2
services/api/app/controllers/arvados/v1/schema_controller.rb:31

Point being, most of the work is documenting the common methods once in schema_controller, there aren't tons of exceptions and additions beyond that.

Actions #3

Updated by Peter Amstutz almost 2 years ago

  • Release set to 60
Actions #4

Updated by Brett Smith almost 2 years ago

  • Release deleted (60)
  • Target version set to To be scheduled

I am proposing we do this as part of improving our documentation generally. It's been groomed. We don't have to take this approach, but I would like to at least discuss it and not postpone it automatically.

Actions #5

Updated by Brett Smith almost 2 years ago

  • Related to Bug #3817: [API] Discovery document schema does not include generated fields like job dependencies added
Actions #6

Updated by Brett Smith over 1 year ago

  • Description updated (diff)
Actions #7

Updated by Peter Amstutz over 1 year ago

  • Target version changed from To be scheduled to Development 2023-05-10 sprint
Actions #8

Updated by Peter Amstutz over 1 year ago

  • Assigned To set to Brett Smith
Actions #9

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-05-10 sprint to Development 2023-05-24 sprint
Actions #10

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-05-24 sprint to Development 2023-06-07
Actions #11

Updated by Peter Amstutz over 1 year ago

  • Target version changed from Development 2023-06-07 to To be scheduled
Actions #12

Updated by Peter Amstutz about 1 year ago

  • Target version changed from To be scheduled to Development 2023-11-29 sprint
Actions #13

Updated by Peter Amstutz about 1 year ago

Actions #14

Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)
Actions #15

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-11-29 sprint to Development 2024-01-03 sprint
Actions #16

Updated by Peter Amstutz 12 months ago

  • Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Actions #17

Updated by Peter Amstutz 12 months ago

  • Category changed from Documentation to API
  • Tracker changed from Bug to Support
Actions #18

Updated by Peter Amstutz 12 months ago

  • Tracker changed from Support to Idea
Actions #19

Updated by Peter Amstutz 11 months ago

  • Target version changed from Development 2024-01-17 sprint to Development 2024-01-31 sprint
Actions #20

Updated by Peter Amstutz 11 months ago

  • Target version changed from Development 2024-01-31 sprint to Development 2024-02-14 sprint
Actions #21

Updated by Peter Amstutz 10 months ago

  • Target version changed from Development 2024-02-14 sprint to Development 2024-02-28 sprint
Actions #22

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2024-02-28 sprint to Development 2024-03-13 sprint
Actions #23

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2024-03-13 sprint to Development 2024-03-27 sprint
Actions #24

Updated by Peter Amstutz 9 months ago

  • Target version changed from Development 2024-03-27 sprint to Development 2024-04-10 sprint
Actions #25

Updated by Peter Amstutz 9 months ago

  • Tracker changed from Idea to Feature
Actions #26

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-04-10 sprint to Development 2024-04-24 sprint
Actions #27

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-04-24 sprint to Development 2024-05-08 sprint
Actions #28

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-06-05 sprint
Actions #29

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-06-05 sprint to 439
Actions #30

Updated by Peter Amstutz 6 months ago

  • Target version changed from 439 to Development 2024-06-19 sprint
Actions #31

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-06-19 sprint to Development 2024-07-03 sprint
Actions #32

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-07-03 sprint to Development 2024-07-24 sprint
Actions #33

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-07-24 sprint to Development 2024-08-07 sprint
Actions #34

Updated by Peter Amstutz 4 months ago

  • Target version changed from Development 2024-08-07 sprint to Development 2024-08-28 sprint
Actions #35

Updated by Brett Smith 3 months ago

  • Status changed from New to In Progress
Actions #36

Updated by Brett Smith 3 months ago

Questions that have come up so far:

  • Why does bypass_federation require admin? Couldn't you achieve the same result by running the query with cluster_id=<receiving cluster ID>?
  • All of our ThingList resources document next_link, next_page_token, and selfLink properties. However, we don't seem to return these anymore (checked with curl against Playground) and I'm not sure we ever supported them. Okay to remove them?
  • For groups/contents, the API documentation says uuid is required, but the discovery document disagrees. Testing agrees with the discovery document. Calling without uuid seems to just return everything. How do we want to reconcile this?
  • How is include_old_versions supposed to work for collections/get? It seems to be a noop in my testing and it's not documented in the API reference.
  • The documentation for the filters attribute of container requests says it uses the same syntax as the list method. But the filters attribute is a string, while the filters method parameter is an array, so, how the heck does this work? Is it a JSON-serialized string? Why did we do it that way instead of just implementing it as a JsonbType::Array?
Actions #37

Updated by Peter Amstutz 3 months ago

  • Target version changed from Development 2024-08-28 sprint to Development 2024-09-11 sprint
Actions #38

Updated by Brett Smith 3 months ago

19929-fill-discovery-document @ 00184bf486c294ab0f78df9f22013c202c8f1600 - developer-run-tests: #4424

  • All agreed upon points are implemented / addressed.
    • Yes.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • The questions I asked earlier are still outstanding, and represent gaps in the documentation. They could be addressed in review or in a refinement ticket.
    • Something about the new documentation causes pdoc to fail to convert --- to sometimes. It's not totally consistent and I can't figure out the pattern. I would like to deal with it but it didn't seem worth holding up the branch over. --- isn't less readable, just less pretty. We can make a follow-up ticket if we can't figure it out for review.
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above. Added a test to verify that the discovery document has complete documentation so we can't forget to add stuff in the future. Also rendered the Python reference and reviewed it manually. It renders fine if not perfectly.
  • Documentation has been updated.
    • Yes
  • Behaves appropriately at the intended scale (describe intended scale).
    • No change in scale
  • Considered backwards and forwards compatibility issues between client and server.
    • No change in compatibility
  • Follows our coding standards and GUI style guidelines.
    • I guess? We haven't really established a style for this documentation. It's close to the Python docstring style, but not an exact match, since something more language-agnostic has different considerations.
Actions #39

Updated by Brett Smith 3 months ago

The branch as written breaks R SDK generation, because autoGenAPI.R writes out R source with the assumption that every description in the discovery document is a single line. If that stops being true, the source it generates has syntax errors.

I need to check whether the same is true of the Java SDK.

It would've been nice to know about this ahead of time. Now I'm committed to more time no matter what I do: either I have to go back and simplify the current documentation, or else improve the R code.

I'm leaving the branch up for review for now because it's at least at a good checkpoint where it would be helfpul to get input.

Actions #40

Updated by Brett Smith 3 months ago

Brett Smith wrote in #note-39:

The branch as written breaks R SDK generation, because autoGenAPI.R writes out R source with the assumption that every description in the discovery document is a single line.

Fortunately the fixes required for this were small. Now at 295b53d946fa18aabc1322125b83906cd74bff96 - developer-run-tests-doc-sdk-java-R: #2225

Actions #41

Updated by Brett Smith 3 months ago

After generating a new Arvados.R with the new discovery document, some tests fail: developer-run-tests-doc-sdk-java-R: #2226 /consoleFull

I note these are all warnings. The generated code builds fine. It's just pointing out gaps in the documentation. I'll see what it takes to address these. Some are more serious than others.

* checking for code/documentation mismatches ... WARNING
Error in vapply(exprs, function(e) as.character(e[[1L]]), "") : 
  values must be length 1,
 but FUN(X[[1]]) result is length 3
Calls: <Anonymous> -> vapply
Execution halted

No clue what this means.

For each method that takes a body, there's a case mismatch between documentation and signature:

Undocumented arguments in documentation object 'api_client_authorizations.create'
  ‘apiclientauthorization’
Documented arguments not in \usage in documentation object 'api_client_authorizations.create':
  ‘apiClientAuthorization’

For each method, it complains that some example variables are not explained:

Objects in \usage without \alias in documentation object 'api_client_authorizations.create':
  ‘$’ ‘arv’

Something is up with arguments that take no methods:

Undocumented arguments in documentation object 'configs.get'
  ‘NULL’

This alias function is not fully documented:

Argument items with no description in Rd object 'projects.list':
  ‘filters’ ‘where’ ‘order’ ‘distinct’ ‘limit’ ‘offset’ ‘count’ ‘uuid’

Obviously supporting non-GNU make is a non-goal for us, but just dealing with the issues might be easier than trying to figure out how to ignore this warning:

Found the following file(s) containing GNU extensions:
  Makefile
Portable Makefiles do not use GNU extensions such as +=, :=, $(shell),
$(wildcard), ifeq ... endif, .NOTPARALLEL See section ‘Writing portable
packages’ in the ‘Writing R Extensions’ manual.
Actions #42

Updated by Brett Smith 3 months ago

  • Related to Feature #21909: Refresh R SDKs based on 3.0 API changes added
Actions #43

Updated by Brett Smith 3 months ago

Brett Smith wrote in #note-41:

[...Error in vapply...]

No clue what this means.

Apparently the documentation checker actually runs inline code and this means something isn't executing as intended. I'm not sure but this might even apply to inline code. The new discovery document marks up literal values with backticks, so it's possible the documentation checker is trying to execute that as R code and failing.

If I'm following that right, I don't think this is a good check for us to be doing.

Actions #44

Updated by Tom Clegg 3 months ago

Brett Smith wrote in #note-39:

I'm leaving the branch up for review for now because it's at least at a good checkpoint where it would be helfpul to get input.

Changes up to 00184bf486 LGTM. So great to see those half-hearted placeholder descriptions finally replaced with something useful!

Just a couple of nits to consider:

when we say "UTC time in ISO 8601 format", it would be more precise to say "UTC date and time in ISO 8601 format" since in general "UTC time" can mean really just the time without date. Confusion seems unlikely, though.

ContainerRequest.cumulative_cost description could say "cost ... to run container(s) to fulfill this container request and any subrequests."

Actions #45

Updated by Tom Clegg 3 months ago

Brett Smith wrote in #note-36:

  • Why does bypass_federation require admin? Couldn't you achieve the same result by running the query with cluster_id=<receiving cluster ID>?

In the case of list methods, it looks like bypass_federation=true predated cluster_id=zzzzz and the newer API is strictly more expressive, with the slight inconvenience of having to get the cluster ID.

In the case of UserUpdate, bypass_federation=true updates the local cached copy of a user whose UUID indicates a remote LoginCluster, and the cluster_id=zzzzz option is not available there because (apart from this weird "update local cache" case) the target UUID indicates which cluster the operation needs to happen on.

That might answer the question of why bypass_federation is admin-only.

  • All of our ThingList resources document next_link, next_page_token, and selfLink properties. However, we don't seem to return these anymore (checked with curl against Playground) and I'm not sure we ever supported them. Okay to remove them?

Yes, I think we should remove them.

  • For groups/contents, the API documentation says uuid is required, but the discovery document disagrees. Testing agrees with the discovery document. Calling without uuid seems to just return everything. How do we want to reconcile this?

I think groups/contents with exclude_home_project=true and no uuid is the only practical way to get "shared with me", so the solution is to fix the API docs to say that if uuid is not provided then all readable objects are in scope.

  • How is include_old_versions supposed to work for collections/get? It seems to be a noop in my testing and it's not documented in the API reference.

If you update a collection with uuid A (and CollectionVersioning is enabled), the old version gets preserved with a newly assigned uuid B. Then, GET /arvados/v1/collections/B returns 404 but GET /arvados/v1/collections/B?include_old_versions=true returns the old version. IIRC the rationale was that it would be too confusing if regular applications could use old versions without noticing (for example, if no current collection has a given PDH, then that PDH should not exist in arv-mount's by_id dir) -- also, if you could GET collection A and it has owner_uuid B, but collection A doesn't appear in the list of collections in project B, that would be confusing.

  • The documentation for the filters attribute of container requests says it uses the same syntax as the list method. But the filters attribute is a string, while the filters method parameter is an array, so, how the heck does this work? Is it a JSON-serialized string? Why did we do it that way instead of just implementing it as a JsonbType::Array?
AFAICT...
  • the filters column/attribute was accidentally left behind while other json columns were converted from from text to json
  • we have never implemented filters, we just store it as text in the database (if it actually worked, presumably we would see the code in Container.find_reusable)
  • we forgot to ever mention in the docs that filters isn't implemented (wow)

I'm not actually sure how/whether it even works to read a container_request with a filters attr. It looks like the Rails response would say "filters":"[...]" which I would expect to fail when unmarshaling to a []Filter in Go. Perhaps it only works when empty because "filters":null does unmarshal to a []Filter in Go.

Actions #46

Updated by Brett Smith 3 months ago

Rebased the branch to take out the R commits, we can just let all that be part of #21909. Now at 69f1c86ada2cd5256cf170ee6da9ceb43d3a9a82 - developer-run-tests: #4429

Tom Clegg wrote in #note-45:

Brett Smith wrote in #note-36:

  • For groups/contents, the API documentation says uuid is required, but the discovery document disagrees. Testing agrees with the discovery document. Calling without uuid seems to just return everything. How do we want to reconcile this?

I think groups/contents with exclude_home_project=true and no uuid is the only practical way to get "shared with me", so the solution is to fix the API docs to say that if uuid is not provided then all readable objects are in scope.

It might've been easier to put two and two together here if exclude_home_project was mentioned in the discovery document at all. Addressed both issues in 2b11547708.

  • How is include_old_versions supposed to work for collections/get? It seems to be a noop in my testing and it's not documented in the API reference.

If you update a collection with uuid A (and CollectionVersioning is enabled), the old version gets preserved with a newly assigned uuid B. Then, GET /arvados/v1/collections/B returns 404 but GET /arvados/v1/collections/B?include_old_versions=true returns the old version.

This is what my testing does not agree with. Using the Python SDK against tordo, I can get an older-version collection by UUID no matter whether include_old_versions is True, False, or unspecified.

I see the code across CollectionsController#show and ArvadosModel#readable_by that is meant to implement this, but grepping the RailsAPI tests for include_old_versions suggests it isn't tested at all…

Actions #47

Updated by Tom Clegg 3 months ago

Brett Smith wrote in #note-46:

If you update a collection with uuid A (and CollectionVersioning is enabled), the old version gets preserved with a newly assigned uuid B. Then, GET /arvados/v1/collections/B returns 404 but GET /arvados/v1/collections/B?include_old_versions=true returns the old version.

This is what my testing does not agree with. Using the Python SDK against tordo, I can get an older-version collection by UUID no matter whether include_old_versions is True, False, or unspecified.

I see the code across CollectionsController#show and ArvadosModel#readable_by that is meant to implement this, but grepping the RailsAPI tests for include_old_versions suggests it isn't tested at all…

I looked closer, and now it looks like the include_old_versions flag was only ever intended to work with collections.list (where it makes more sense because normally you don't want "filter by name" results to include old versions), and we accidentally started advertising it for the collections.get endpoint in #15306 b3d5254ce24ca82904b13d61012b2d8d676a30d8 when we should have only started advertising the include_trash flag.

Actions #48

Updated by Brett Smith 3 months ago

Tom Clegg wrote in #note-47:

I looked closer, and now it looks like the include_old_versions flag was only ever intended to work with collections.list (where it makes more sense because normally you don't want "filter by name" results to include old versions), and we accidentally started advertising it for the collections.get endpoint in #15306 b3d5254ce24ca82904b13d61012b2d8d676a30d8 when we should have only started advertising the include_trash flag.

There's a second convenient historical wrinkle: I happened to notice that this parameter is not advertised in pirca's discovery document—in other words, it's not in a released version. I dug some more and noticed it was advertised on the "show" endpoint, but not the "get" one. The reason it started appearing on tordo/development is because of e49ab1fa2bc83ddb6a3ad3ef6ca4b16391a72094.

So given that this never worked, was never documented, and was never even in the discovery document as a collections.get paremeter in a released version, I think it's pretty safe to remove. Now at fa168f33954820bff58280810b6be01ea49dbc15 - developer-run-tests: #4430

Actions #49

Updated by Tom Clegg 3 months ago

Agreed. LGTM.

Actions #50

Updated by Brett Smith 3 months ago

Brett Smith wrote in #note-48:

So given that this never worked, was never documented, and was never even in the discovery document as a collections.get paremeter in a released version, I think it's pretty safe to remove. Now at fa168f33954820bff58280810b6be01ea49dbc15 - developer-run-tests: #4430

"Just" removing it from the discovery document results in:

  3) Failure:                    
Arvados::V1::CollectionsControllerTest#test_can_get_old_version_collection_by_uuid [/home/brett/Curii/arvados/services/api/test/functional/arvados/v1/collections_controller_test.rb:1385]:
Expected response to be a <2XX: success>, but was a <404: Not Found> 
Response body: {"errors":["Path not found ()"],"error_token":"1725634586+645a97cb"}

So there's more clean-up to do.

Actions #51

Updated by Brett Smith 2 months ago

  • Status changed from In Progress to Resolved
Actions #52

Updated by Brett Smith 2 months ago

  • Release set to 70
Actions

Also available in: Atom PDF