Project

General

Profile

Actions

Idea #19791

closed

Write Python SDK overview

Added by Brett Smith over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
-
Release relationship:
Auto

Description

Write a document that explains:

  • The Python SDK dynamically implements interfaces from the Arvados API.
  • Every resource type has a corresponding class.
  • API methods can be called as instance methods.
  • Arguments to methods are passed as keyword arguments.
  • Explain some of the abstractions like calling .execute().
  • Link to documentation for the Arvados API and google-api-python-client for further reading as appropriate.

The goal is for this to be a single document that helps people understand the patterns that the API client follows and how it relates to the Arvados API.

IMO you can implement this by building out the current "Examples" page of the documentation. The orientation should be high-level enough to stay useful even after we have full reference documentation from issue #18799.


Subtasks 1 (1 open0 closed)

Task #19800: Review 19791-python-api-overviewIn ProgressPeter Amstutz11/25/2022Actions

Related issues

Related to Arvados - Feature #6865: [Documentation] Higher Level Python SDK Reference PageDuplicate08/03/2015Actions
Actions #1

Updated by Peter Amstutz over 1 year ago

  • Target version set to 2022-12-07 Sprint
Actions #2

Updated by Peter Amstutz over 1 year ago

  • Assigned To set to Brett Smith
Actions #3

Updated by Brett Smith over 1 year ago

  • Status changed from New to In Progress
Actions #4

Updated by Peter Amstutz over 1 year ago

  • In the get section, I would split the 1st and 2nd examples into two code blocks.
  • In the list section, it would be better to direct people to use arvados.util.keyset_list_all to handle paging instead of doing it manually.
  • I think an additional section that describes using group().contents() to get project contents would be a good idea.
  • For create it's better to have the object contents in a field corresponding to the object type, so instead of this:
project = arv_client.groups().create(
    body={
        'name': 'Python SDK Test Project',
        'group_class': 'project',
    },
    ensure_unique_name=True,
).execute()

Do this:

project = arv_client.groups().create(
    body={
        'group': {
            'name': 'Python SDK Test Project',
            'group_class': 'project'
        }
    },
    ensure_unique_name=True,
).execute()

(background here is that the second one is more correct to the API definition, the 1st one works because the Rails API does a fixup that moves fields from the top level into the object, but runs a small risk of confusion between top level parameters like 'limit' and 'select' and the object contents).

To modify an existing Arvados object, call the delete method for that resource type.

I think you meant "remove an existing Arvados object"

I don't think you want to go into the weeds of setting trash_at here. It would probably be simpler to just explain that some object types go into the trash and can be recovered, and some are deleted immediately. The ones that can be removed from the trash have an untrash method.

Actions #5

Updated by Brett Smith over 1 year ago

All addressed @ 7abef669c except:

Peter Amstutz wrote in #note-4:

  • I think an additional section that describes using group().contents() to get project contents would be a good idea.

I think it would provide the best organization to leave this for the planned cookbook update. Right now this page covers just the API's common resource methods, and then mentions multiple times that you can use the same patterns to call other API methods. If we start covering those other methods, I don't know how we draw the line of which are worth including and which aren't. The cookbook already covers that subjective space of "stuff people ask about a lot," so let's just cover it there. (We might consider adding users/current too: that was covered in the old "Examples" page and I took it out for the same reasons.)

Actions #6

Updated by Peter Amstutz over 1 year ago

Brett Smith wrote in #note-5:

All addressed @ 7abef669c except:

Peter Amstutz wrote in #note-4:

  • I think an additional section that describes using group().contents() to get project contents would be a good idea.

I think it would provide the best organization to leave this for the planned cookbook update. Right now this page covers just the API's common resource methods, and then mentions multiple times that you can use the same patterns to call other API methods. If we start covering those other methods, I don't know how we draw the line of which are worth including and which aren't. The cookbook already covers that subjective space of "stuff people ask about a lot," so let's just cover it there. (We might consider adding users/current too: that was covered in the old "Examples" page and I took it out for the same reasons.)

Sounds good.

If you need to retrieve all of the results for a list, you need to call the list method multiple times with the same search criteria and increasing offset arguments until no more items are returned.

This is slightly wrong, keyset_list_all doesn't use offset, it uses > on order_key (this gives a more stable result if records change mid-paging).

Probably also be worth mentioning that while limit sets the maximum page size, and the default is 100, the value of limit is clamped to 1000, so if you expect there could possibly be more than 1000 items, you definitely need to use keyset_list_all.

Rest LGTM

Actions #7

Updated by Brett Smith over 1 year ago

Peter Amstutz wrote in #note-6:

Sounds good.

Cool, I added a note to this effect in a comment to try to help keep future writing in line with this scope.

If you need to retrieve all of the results for a list, you need to call the list method multiple times with the same search criteria and increasing offset arguments until no more items are returned.

This is slightly wrong, keyset_list_all doesn't use offset, it uses > on order_key (this gives a more stable result if records change mid-paging).

I just took this out, I feel like this rationale is left over from the previous draft and not really helpful given the recommendation to just use keyset_list_all.

Probably also be worth mentioning that while limit sets the maximum page size, and the default is 100, the value of limit is clamped to 1000, so if you expect there could possibly be more than 1000 items, you definitely need to use keyset_list_all.

Done @ 9e44b1581.

Actions #8

Updated by Peter Amstutz over 1 year ago

Brett Smith wrote in #note-7:

Peter Amstutz wrote in #note-6:

Sounds good.

Cool, I added a note to this effect in a comment to try to help keep future writing in line with this scope.

If you need to retrieve all of the results for a list, you need to call the list method multiple times with the same search criteria and increasing offset arguments until no more items are returned.

This is slightly wrong, keyset_list_all doesn't use offset, it uses > on order_key (this gives a more stable result if records change mid-paging).

I just took this out, I feel like this rationale is left over from the previous draft and not really helpful given the recommendation to just use keyset_list_all.

Probably also be worth mentioning that while limit sets the maximum page size, and the default is 100, the value of limit is clamped to 1000, so if you expect there could possibly be more than 1000 items, you definitely need to use keyset_list_all.

Done @ 9e44b1581.

This LGTM.

Actions #9

Updated by Brett Smith over 1 year ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #10

Updated by Peter Amstutz over 1 year ago

  • Release set to 47
Actions #11

Updated by Brett Smith over 1 year ago

  • Related to Feature #6865: [Documentation] Higher Level Python SDK Reference Page added
Actions

Also available in: Atom PDF