Pipeline template development » History » Version 2

Bryan Cosca, 04/19/2016 07:55 PM

1 1 Bryan Cosca
h1. Pipeline template development
2 1 Bryan Cosca
3 2 Bryan Cosca
This wiki will describe how to write a pipeline template. Some documentation for writing a pipeline template using run-command is available on "doc.arvados.org":http://doc.arvados.org/user/tutorials/running-external-program.html
4 2 Bryan Cosca
5 2 Bryan Cosca
<pre>
6 2 Bryan Cosca
"components": {
7 2 Bryan Cosca
 "JobName": {
8 2 Bryan Cosca
  "script": "JobScript",
9 2 Bryan Cosca
  "script_version": "master",
10 2 Bryan Cosca
  "repository": "yourname/yourname",
11 2 Bryan Cosca
  "script_parameters": {
12 2 Bryan Cosca
   "CollectionOne": {
13 2 Bryan Cosca
    "required": true,
14 2 Bryan Cosca
    "dataclass": "Collection"
15 2 Bryan Cosca
   },
16 2 Bryan Cosca
   "ParameterOne":{
17 2 Bryan Cosca
    "required": true,
18 2 Bryan Cosca
    "dataclass": "text",
19 2 Bryan Cosca
    "default": "ParameterOneString"
20 2 Bryan Cosca
   }
21 2 Bryan Cosca
  },
22 2 Bryan Cosca
  "runtime_constraints": {
23 2 Bryan Cosca
   "docker_image": "bcosc/arv-base-java",
24 2 Bryan Cosca
   "arvados_sdk_version": "master"
25 2 Bryan Cosca
  }
26 2 Bryan Cosca
 }
27 2 Bryan Cosca
}
28 2 Bryan Cosca
</pre>
29 2 Bryan Cosca
30 1 Bryan Cosca
How to wrap a git repository containing a crunch script and a docker image into a component
31 1 Bryan Cosca
Link to "Git Strategy for Pipeline Development" wiki page
32 1 Bryan Cosca
33 2 Bryan Cosca
h3. Writing script_parameters
34 1 Bryan Cosca
35 2 Bryan Cosca
"Script_parameters":http://doc.arvados.org/api/schema/PipelineTemplate.html are inputs that can be called in your crunch script. Each script parameter can have any dataclass: Collection, File, number, text. Collection passes in the pdh string (ex. 39c6f22d40001074f4200a72559ae7eb+5745), File passes in a file path in a collection (ex. 39c6f22d40001074f4200a72559ae7eb+5745/foo.txt), number passes in any integer, and text passes in any string.
36 1 Bryan Cosca
37 2 Bryan Cosca
The default parameter is useful for using a collection you know will most likely be used, so the user does not have to input it manually. For example, a reference genome collection that will be used throughout the entire pipeline.
38 2 Bryan Cosca
39 2 Bryan Cosca
The title and description parameters are useful for showing what the script parameter is doing, but is not necessary.
40 2 Bryan Cosca
41 2 Bryan Cosca
h3. Writing runtime_constraints
42 2 Bryan Cosca
43 2 Bryan Cosca
"Runtime_constraints":http://doc.arvados.org/api/schema/Job.html are inputs in your job that help choose node parameters that your pipeline will run on. Optimizing these parameters can be found in the "Pipeline_Optimization wiki.":https://dev.arvados.org/projects/arvados/wiki/Pipeline_Optimization
44 2 Bryan Cosca
45 1 Bryan Cosca
The actual meaning of min_nodes
46 1 Bryan Cosca
Setting max_tasks_per_node != 1