Idea #19792: Python SDK cookbook explains its examples better - Arvados

Actions

Copy link

Idea #19792

closed

Python SDK cookbook explains its examples better

Added by Brett Smith over 2 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Brett Smith

Category:

SDKs

Target version:

2023-02-01 sprint

Start date:

01/02/2023

Due date:

Story points:

1.0

Release:

Arvados 2.6.0

Release relationship:

Auto

Description

The examples in the Python SDK cookbook are good illustrations of common tasks, but there isn't much in there to help the reader understand why the code is structured the way it is, or how they might adapt the code to their particular use case. Expand the cookbook to improve this situation.

Organize the recipes into related sections.
Add a preface to each recipe to detail what it does and give a use case for when you might want it.
Add comments to the code to explain tricky parts or suggest how to make common changes.
Structure the code to prioritize follow-ability. Use variable names that are as self-documenting as possible. Avoid doing multiple things on one line of code.

Subtasks 1 (0 open — 1 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Peter Amstutz over 2 years ago

Target version set to 2022-12-07 Sprint

Actions

Copy link

Updated by Peter Amstutz over 2 years ago

Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint

Actions

Copy link

Updated by Brett Smith over 2 years ago

Story points set to 1.0

Actions

Copy link

Updated by Peter Amstutz over 2 years ago

Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint

Actions

Copy link

Updated by Peter Amstutz over 2 years ago

Target version changed from 2023-01-18 sprint to 2022-12-21 Sprint

Actions

Copy link

Updated by Peter Amstutz over 2 years ago

Assigned To set to Brett Smith

Actions

Copy link

Updated by Peter Amstutz over 2 years ago

Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint

Actions

Copy link

Updated by Brett Smith over 2 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Brett Smith over 2 years ago

Related to Feature #6865: [Documentation] Higher Level Python SDK Reference Page added

Actions

Copy link

#11

Updated by Peter Amstutz about 2 years ago

Target version changed from 2023-01-18 sprint to 2023-02-01 sprint

Actions

Copy link

#12

Updated by Brett Smith about 2 years ago

19792-pysdk-cookbook @ dfa06386290a17fb768c359de12719b170f02a72 has been updated with a little reorganization and a few more recipes based on user feedback.

Actions

Copy link

#13

Updated by Peter Amstutz about 2 years ago

19792-pysdk-cookbook @ dfa06386290a17fb768c359de12719b170f02a72

This is really good, thank you for putting time into this.

A few notes

Working with permissions¶

name is one of "none", "can_read", "can_write", "can_manage", or "can_login".

"none" is not actually a valid value

the relevant deep link might be https://doc.arvados.org/v2.5/api/permission-model.html#links

Not sure if the additional complexity of iterating over permissions is needed. A narrower example that modifies just one specific permission involving a user and a specific object would probably be easier.

Now that we have permission link de-duplication, for a given head/tail there will only ever be one or zero links, so while you still need to use links.list() you can just use if respose["items"]: ... field and assume it will only ever have either 1 or 0 elements.

... to be continued

Actions

Copy link

#14

Updated by Brett Smith about 2 years ago

I don't feel super strongly about any of this stuff but just to explain my reasoning behind some of the decisions:

Peter Amstutz wrote in #note-13:

the relevant deep link might be https://doc.arvados.org/v2.5/api/permission-model.html#links

I agree that's the most specific link, but I felt like it was important to understand the underlying permissions model as well, so the whole page seemed on point.

Not sure if the additional complexity of iterating over permissions is needed. A narrower example that modifies just one specific permission involving a user and a specific object would probably be easier.

When I made this decision I was thinking about what an orchestration script was likely to do. For scripts manipulating things like collections and container requests, it makes sense they could have a UUID that the user pasted in from Workbench or whatever. Since links are less friendly to browse that seems much less likely, and scripts are more likely to find the link(s) they want to manipulate by searching for it.

Now that we have permission link de-duplication, for a given head/tail there will only ever be one or zero links, so while you still need to use links.list() you can just use if respose["items"]: ... field and assume it will only ever have either 1 or 0 elements.

As they're currently written, each example only searches one half of head/tail UUID, so there still may be more than one link.

Also, link de-duplication hasn't even been in a release yet, and given the nature of this documentation I expect we'll show it to users who maybe haven't upgraded to the latest version yet.

Even when it's "not necessary," it seems nicer for the cookbook to stay consistent and always use keyset_list_all. Calling the list API directly may leave the user wondering why this is a special case, and explaining the distinction seems more likely to be confusing than valuable to me. I don't think there's anything wrong with writing a loop that may only ever iterate 0 or 1 items.

Actions

Copy link

#15

Updated by Peter Amstutz about 2 years ago

Introduction¶

It could use one sentence like "This page demonstrates how to do common operations using the Python SDK" to provide a tiny bit of context.

General comments¶

Do not call the method here, just pass it.

I would rephrase this slightly to explain what is happening, e.g. "pass the query method that will be called by list_all"

I find my eyes are drawn to the code blocks, it's easy to skim or completely ignore the smaller text above it when it is more than one sentence. I would move notes about what the exact code is doing into a comment inside the code block, e.g. "This example changes all can_write permissions granted on object XYZ to can_read".

Properties¶

I agree with merging the discussion about vocabulary into discussion of properties. I think there should be a line that explains what vocabularies are, links to other documentation, and says this is a optional feature that may or may not be enabled on your site.

(This also makes me think that it should be possible to write scripts that behave reasonably whether vocabulary is enabled or not, I'm not entirely sure that is the case right now, but that is a separate question).

Collections¶

I really like the shutil.copyfileobj(src_file, dst_file) trick. That is a neat solution to something that's been a source of complaints.

Instead of "Read a file from a collection" and "Write a file to a collection" maybe say "Download" and "Upload"? I feel like those are the magic words people are likely to be looking for. Or even split out "Read" and "Download" and "Write" and "Upload" examples into separate sections, since those sections already contain multiple code blocks.

Delete a file -- I would show it using save() not save_new() since it doesn't particularly make sense that you would delete a file in a collection you just created.

Containers¶

Showing how to list all the mounts and the section that specifically shows using cwl.input.json should be separate sections.

Same for outputs.

I think a running theme here is that each freestanding code block should probably go in its own section.

Actions

Copy link

#16

Updated by Brett Smith about 2 years ago

Now at 1da341c2da87314a7d42a653fcdfb3d51d332ce2 with all comments addressed except:

Peter Amstutz wrote in #note-15:

General comments¶

Do not call the method here, just pass it.

I would rephrase this slightly to explain what is happening, e.g. "pass the query method that will be called by list_all"

I've reworded this a little, although I've still kept the "don't call it here" warning. My thinking here is, this cookbook may be the first time some of our readers are introduced to the concept of passing a function/method around as an object without calling it, so I want to call this out to help them do the right thing. keyset_list_all is already documented in the general API client introduction, and the Introduction here specifically says you need to be familiar with that material, so I don't think it's necessary for the cookbook to go into too much detail about how this works.

I find my eyes are drawn to the code blocks, it's easy to skim or completely ignore the smaller text above it when it is more than one sentence. I would move notes about what the exact code is doing into a comment inside the code block, e.g. "This example changes all can_write permissions granted on object XYZ to can_read".

I agree it's easy to miss the text and that's not ideal. Moving the text into comments comes with its own problems, namely it can't have markup, which means we can't link to additional background documentation, which the cookbook does a lot. That feels worse overall to me.

Some style tweaks, like more spacing between the header+text+codeblock, might be a better solution overall. However, doing that in a way that doesn't degrade other docs would take some forethought and fine-tuning. With the change to split out more code blocks into separate sections, more sections have more introductory text, which makes it a little harder to miss. I propose we let this ride and consider a separate ticket for further improvements.

I really like the shutil.copyfileobj(src_file, dst_file) trick. That is a neat solution to something that's been a source of complaints.

I'm not sure how using a stdlib function in the documented way is a "trick," but I'm glad you like it. :)

Delete a file -- I would show it using save() not save_new() since it doesn't particularly make sense that you would delete a file in a collection you just created.

That doesn't, but it does make sense to open an existing collection, delete something from it, and save a new collection from the result. My overarching rationale here is: the cookbook can't really know whether the reader wants to update an existing collection or save a new one. For less experienced Arvados users, it is easier to figure out how to recover from "I created a new collection when I meant to update one" rather than the opposite situation. Therefore, for the sake of both caution and consistency, all the examples use save_new and call out save as an alternative. I think this leads to less stress if the reader copies the code without understanding and gets a result they don't want.

Actions

Copy link

#17

Updated by Peter Amstutz about 2 years ago

Should the Collection examples be explicit about the IO mode? i.e. "rb" and "wb" or "wt"

Rest LGTM, please merge

Actions

Copy link

#18

Updated by Brett Smith about 2 years ago

Peter Amstutz wrote in #note-17:

Should the Collection examples be explicit about the IO mode? i.e. "rb" and "wb" or "wt"

I didn't change the examples but I did call out 'wb' as another possible mode in the intro text. I think some readers won't recognize 'wt' since t is the default. The cookbook doesn't need to, and shouldn't, document every detail of the SDK. We can work on improving our docstrings for that. I hope this change strikes a nice balance between hinting we have full mode support, while leaving the full docs for the docstrings when we get to that point. Thanks.

Actions

Copy link

#19

Updated by Brett Smith about 2 years ago

Status changed from In Progress to Resolved

Applied in changeset arvados|674630a6f461527f5b26e917814736b444cb4f51.

Actions

Copy link

#20

Updated by Peter Amstutz about 2 years ago

Release set to 57

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Idea #19792

Python SDK cookbook explains its examples better

Updated by Peter Amstutz over 2 years ago

Updated by Peter Amstutz over 2 years ago

Updated by Brett Smith over 2 years ago

Updated by Peter Amstutz over 2 years ago

Updated by Peter Amstutz over 2 years ago

Updated by Peter Amstutz over 2 years ago

Updated by Peter Amstutz over 2 years ago

Updated by Brett Smith over 2 years ago

Updated by Brett Smith over 2 years ago

Updated by Peter Amstutz about 2 years ago

Updated by Brett Smith about 2 years ago

Updated by Peter Amstutz about 2 years ago

Working with permissions¶

Updated by Brett Smith about 2 years ago

Updated by Peter Amstutz about 2 years ago

Introduction¶

General comments¶

Properties¶

Collections¶

Containers¶

Updated by Brett Smith about 2 years ago

General comments¶

Updated by Peter Amstutz about 2 years ago

Updated by Brett Smith about 2 years ago

Updated by Brett Smith about 2 years ago

Updated by Peter Amstutz about 2 years ago