Project

General

Profile

Actions

Idea #8884

closed

[Docs] Pipeline author guide gives a basic demonstration of writing a pipeline template

Added by Sarah Guthrie over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Bryan Cosca
Category:
Documentation
Target version:
Start date:
04/05/2016
Due date:
Story points:
1.0

Description

Write a new wiki page describing how to write a pipeline template (wrapping a git repository and docker image together):

  • How to wrap a git repository containing a crunch script and a docker image into a component
    • Link to "Git Strategy for Pipeline Development" wiki page
  • How to write script_parameters
    • dataclasses and their actual meanings ("Collection" passes in the pdh string, etc)
  • How to choose runtime_constraints
    • The actual meaning of min_nodes
    • Setting max_tasks_per_node != 1

Subtasks 2 (0 open2 closed)

Task #8984: ReviewClosedBrett Smith04/05/2016Actions
Task #8968: ReviewResolvedSarah Guthrie04/05/2016Actions
Actions #1

Updated by Sarah Guthrie over 8 years ago

  • Description updated (diff)
  • Story points set to 1.0
Actions #2

Updated by Sarah Guthrie over 8 years ago

  • Target version changed from Pipeline Future Sprints to 2016-04-27 sprint
Actions #3

Updated by Bryan Cosca over 8 years ago

  • Assigned To set to Bryan Cosca
Actions #4

Updated by Bryan Cosca over 8 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Sarah Guthrie over 8 years ago

We should not include "arvados_sdk_version" in the main example given the reported bugs in installing it. Mentioning it and stating that it's a good idea to require the arvados sdk to be installed in the docker image.

It would be awesome to connect the script_parameters (required, default, dataclass) with the effects they have on the workbench view (from a user's perspective).

Actions #7

Updated by Bryan Cosca over 8 years ago

I've updated the wiki to remove arvados_sdk_version and added screenshots for script_parameters.

Actions #8

Updated by Sarah Guthrie over 8 years ago

Feel free to link to this page if it will help you at all: https://dev.arvados.org/projects/arvados/wiki/Writing_a_Script_Calling_a_Third_Party_Tool

We currently are missing a description of "components" in this page, which are a fairly large part of pipeline templates. Simply adding a paragraph saying that pipeline templates are composed of a dictionary of components and that each component maps to a job would be helpful. We can then state that the rest of the document describes the specific pieces of a component.

The main example still has arvados_sdk_version defined.

The introductory paragraph under "Writing script parameters" should talk about the "required" flag so it doesn't come as a surprise later.

"yields this example" is fairly vague. Saying that a particular pipeline template yields a pipeline instance is more specific and accurate.

How does a dataclass "File" influence the view? What about the dataclass "number"?

The following is inconsistent enough to be confusing:

The inputs tab in the pipeline instance page shows all the required parameters.
...
For the 'additional_params' parameter, since its not required, its in the 'Components' tab, where you can set it:

Maybe try?

The "Inputs" tab in the pipeline instance page shows all the required parameters.
...
The "Components" tab in the pipeline instance page shows all the parameters. Thus it is the only place where non-required parameters, such as 'additional_params' may be set.

Why is "read_group" highlighted in red in the Components tab, but nothing else is? That will likely be distracting to a new viewer.

One runtime constraint is docker_image.

What does docker_image control? You describe hints for it, but don't say what it actually does. Is it required? What needs to be in the docker image?

The max_tasks_per_node parameter will allow you to allocate more computations on your node

What are "computations"? We need to use more specific language here. What happens if max_tasks_per_node is equal to 2? When will multiple jobs be scheduled on that node? When will multiple tasks be scheduled on that node?

Actions #9

Updated by Bryan Cosca over 8 years ago

Why is "read_group" highlighted in red in the Components tab, but nothing else is? That will likely be distracting to a new viewer.

I don't know. I could hack it by making the template set it as false, but this is what it looks like when a Text/number parameter is 'required'.

Everything else has been added.

Actions #10

Updated by Sarah Guthrie over 8 years ago

Alright, I'm happy with this. Brett is up next.

Actions #11

Updated by Bryan Cosca over 8 years ago

  • Status changed from In Progress to Resolved
Actions #12

Updated by Bryan Cosca over 8 years ago

resolving due to time constraints and brett does not have time to review and sally reviewed already

Actions

Also available in: Atom PDF