Project

General

Profile

Actions

Idea #19792

closed

Python SDK cookbook explains its examples better

Added by Brett Smith over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
1.0
Release relationship:
Auto

Description

The examples in the Python SDK cookbook are good illustrations of common tasks, but there isn't much in there to help the reader understand why the code is structured the way it is, or how they might adapt the code to their particular use case. Expand the cookbook to improve this situation.

  • Organize the recipes into related sections.
  • Add a preface to each recipe to detail what it does and give a use case for when you might want it.
  • Add comments to the code to explain tricky parts or suggest how to make common changes.
  • Structure the code to prioritize follow-ability. Use variable names that are as self-documenting as possible. Avoid doing multiple things on one line of code.

Subtasks 1 (0 open1 closed)

Task #19855: Review 19792-pysdk-cookbookResolvedPeter Amstutz01/02/2023Actions

Related issues

Related to Arvados - Feature #6865: [Documentation] Higher Level Python SDK Reference PageDuplicate08/03/2015Actions
Actions #1

Updated by Peter Amstutz over 1 year ago

  • Target version set to 2022-12-07 Sprint
Actions #2

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
Actions #3

Updated by Brett Smith over 1 year ago

  • Story points set to 1.0
Actions #4

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
Actions #5

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2023-01-18 sprint to 2022-12-21 Sprint
Actions #6

Updated by Peter Amstutz over 1 year ago

  • Assigned To set to Brett Smith
Actions #7

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
Actions #8

Updated by Brett Smith over 1 year ago

  • Status changed from New to In Progress
Actions #9

Updated by Brett Smith over 1 year ago

  • Related to Feature #6865: [Documentation] Higher Level Python SDK Reference Page added
Actions #11

Updated by Peter Amstutz about 1 year ago

  • Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Actions #12

Updated by Brett Smith about 1 year ago

19792-pysdk-cookbook @ dfa06386290a17fb768c359de12719b170f02a72 has been updated with a little reorganization and a few more recipes based on user feedback.

Actions #13

Updated by Peter Amstutz about 1 year ago

19792-pysdk-cookbook @ dfa06386290a17fb768c359de12719b170f02a72

This is really good, thank you for putting time into this.

A few notes

Working with permissions

name is one of "none", "can_read", "can_write", "can_manage", or "can_login".

"none" is not actually a valid value

the relevant deep link might be https://doc.arvados.org/v2.5/api/permission-model.html#links

Not sure if the additional complexity of iterating over permissions is needed. A narrower example that modifies just one specific permission involving a user and a specific object would probably be easier.

Now that we have permission link de-duplication, for a given head/tail there will only ever be one or zero links, so while you still need to use links.list() you can just use if respose["items"]: ... field and assume it will only ever have either 1 or 0 elements.

... to be continued

Actions #14

Updated by Brett Smith about 1 year ago

I don't feel super strongly about any of this stuff but just to explain my reasoning behind some of the decisions:

Peter Amstutz wrote in #note-13:

the relevant deep link might be https://doc.arvados.org/v2.5/api/permission-model.html#links

I agree that's the most specific link, but I felt like it was important to understand the underlying permissions model as well, so the whole page seemed on point.

Not sure if the additional complexity of iterating over permissions is needed. A narrower example that modifies just one specific permission involving a user and a specific object would probably be easier.

When I made this decision I was thinking about what an orchestration script was likely to do. For scripts manipulating things like collections and container requests, it makes sense they could have a UUID that the user pasted in from Workbench or whatever. Since links are less friendly to browse that seems much less likely, and scripts are more likely to find the link(s) they want to manipulate by searching for it.

Now that we have permission link de-duplication, for a given head/tail there will only ever be one or zero links, so while you still need to use links.list() you can just use if respose["items"]: ... field and assume it will only ever have either 1 or 0 elements.

As they're currently written, each example only searches one half of head/tail UUID, so there still may be more than one link.

Also, link de-duplication hasn't even been in a release yet, and given the nature of this documentation I expect we'll show it to users who maybe haven't upgraded to the latest version yet.

Even when it's "not necessary," it seems nicer for the cookbook to stay consistent and always use keyset_list_all. Calling the list API directly may leave the user wondering why this is a special case, and explaining the distinction seems more likely to be confusing than valuable to me. I don't think there's anything wrong with writing a loop that may only ever iterate 0 or 1 items.

Actions #15

Updated by Peter Amstutz about 1 year ago

Introduction

It could use one sentence like "This page demonstrates how to do common operations using the Python SDK" to provide a tiny bit of context.

General comments

  1. Do not call the method here, just pass it.

I would rephrase this slightly to explain what is happening, e.g. "pass the query method that will be called by list_all"

I find my eyes are drawn to the code blocks, it's easy to skim or completely ignore the smaller text above it when it is more than one sentence. I would move notes about what the exact code is doing into a comment inside the code block, e.g. "This example changes all can_write permissions granted on object XYZ to can_read".

Properties

I agree with merging the discussion about vocabulary into discussion of properties. I think there should be a line that explains what vocabularies are, links to other documentation, and says this is a optional feature that may or may not be enabled on your site.

(This also makes me think that it should be possible to write scripts that behave reasonably whether vocabulary is enabled or not, I'm not entirely sure that is the case right now, but that is a separate question).

Collections

I really like the shutil.copyfileobj(src_file, dst_file) trick. That is a neat solution to something that's been a source of complaints.

Instead of "Read a file from a collection" and "Write a file to a collection" maybe say "Download" and "Upload"? I feel like those are the magic words people are likely to be looking for. Or even split out "Read" and "Download" and "Write" and "Upload" examples into separate sections, since those sections already contain multiple code blocks.

Delete a file -- I would show it using save() not save_new() since it doesn't particularly make sense that you would delete a file in a collection you just created.

Containers

Showing how to list all the mounts and the section that specifically shows using cwl.input.json should be separate sections.

Same for outputs.

I think a running theme here is that each freestanding code block should probably go in its own section.

Actions #16

Updated by Brett Smith about 1 year ago

Now at 1da341c2da87314a7d42a653fcdfb3d51d332ce2 with all comments addressed except:

Peter Amstutz wrote in #note-15:

General comments

  1. Do not call the method here, just pass it.

I would rephrase this slightly to explain what is happening, e.g. "pass the query method that will be called by list_all"

I've reworded this a little, although I've still kept the "don't call it here" warning. My thinking here is, this cookbook may be the first time some of our readers are introduced to the concept of passing a function/method around as an object without calling it, so I want to call this out to help them do the right thing. keyset_list_all is already documented in the general API client introduction, and the Introduction here specifically says you need to be familiar with that material, so I don't think it's necessary for the cookbook to go into too much detail about how this works.

I find my eyes are drawn to the code blocks, it's easy to skim or completely ignore the smaller text above it when it is more than one sentence. I would move notes about what the exact code is doing into a comment inside the code block, e.g. "This example changes all can_write permissions granted on object XYZ to can_read".

I agree it's easy to miss the text and that's not ideal. Moving the text into comments comes with its own problems, namely it can't have markup, which means we can't link to additional background documentation, which the cookbook does a lot. That feels worse overall to me.

Some style tweaks, like more spacing between the header+text+codeblock, might be a better solution overall. However, doing that in a way that doesn't degrade other docs would take some forethought and fine-tuning. With the change to split out more code blocks into separate sections, more sections have more introductory text, which makes it a little harder to miss. I propose we let this ride and consider a separate ticket for further improvements.

I really like the shutil.copyfileobj(src_file, dst_file) trick. That is a neat solution to something that's been a source of complaints.

I'm not sure how using a stdlib function in the documented way is a "trick," but I'm glad you like it. :)

Delete a file -- I would show it using save() not save_new() since it doesn't particularly make sense that you would delete a file in a collection you just created.

That doesn't, but it does make sense to open an existing collection, delete something from it, and save a new collection from the result. My overarching rationale here is: the cookbook can't really know whether the reader wants to update an existing collection or save a new one. For less experienced Arvados users, it is easier to figure out how to recover from "I created a new collection when I meant to update one" rather than the opposite situation. Therefore, for the sake of both caution and consistency, all the examples use save_new and call out save as an alternative. I think this leads to less stress if the reader copies the code without understanding and gets a result they don't want.

Actions #17

Updated by Peter Amstutz about 1 year ago

Should the Collection examples be explicit about the IO mode? i.e. "rb" and "wb" or "wt"

Rest LGTM, please merge

Actions #18

Updated by Brett Smith about 1 year ago

Peter Amstutz wrote in #note-17:

Should the Collection examples be explicit about the IO mode? i.e. "rb" and "wb" or "wt"

I didn't change the examples but I did call out 'wb' as another possible mode in the intro text. I think some readers won't recognize 'wt' since t is the default. The cookbook doesn't need to, and shouldn't, document every detail of the SDK. We can work on improving our docstrings for that. I hope this change strikes a nice balance between hinting we have full mode support, while leaving the full docs for the docstrings when we get to that point. Thanks.

Actions #19

Updated by Brett Smith about 1 year ago

  • Status changed from In Progress to Resolved
Actions #20

Updated by Peter Amstutz about 1 year ago

  • Release set to 57
Actions

Also available in: Atom PDF