Project

General

Profile

Pipeline template development » History » Version 2

Bryan Cosca, 04/19/2016 07:55 PM

1 1 Bryan Cosca
h1. Pipeline template development
2
3 2 Bryan Cosca
This wiki will describe how to write a pipeline template. Some documentation for writing a pipeline template using run-command is available on "doc.arvados.org":http://doc.arvados.org/user/tutorials/running-external-program.html
4
5
<pre>
6
"components": {
7
 "JobName": {
8
  "script": "JobScript",
9
  "script_version": "master",
10
  "repository": "yourname/yourname",
11
  "script_parameters": {
12
   "CollectionOne": {
13
    "required": true,
14
    "dataclass": "Collection"
15
   },
16
   "ParameterOne":{
17
    "required": true,
18
    "dataclass": "text",
19
    "default": "ParameterOneString"
20
   }
21
  },
22
  "runtime_constraints": {
23
   "docker_image": "bcosc/arv-base-java",
24
   "arvados_sdk_version": "master"
25
  }
26
 }
27
}
28
</pre>
29
30 1 Bryan Cosca
How to wrap a git repository containing a crunch script and a docker image into a component
31
Link to "Git Strategy for Pipeline Development" wiki page
32
33 2 Bryan Cosca
h3. Writing script_parameters
34 1 Bryan Cosca
35 2 Bryan Cosca
"Script_parameters":http://doc.arvados.org/api/schema/PipelineTemplate.html are inputs that can be called in your crunch script. Each script parameter can have any dataclass: Collection, File, number, text. Collection passes in the pdh string (ex. 39c6f22d40001074f4200a72559ae7eb+5745), File passes in a file path in a collection (ex. 39c6f22d40001074f4200a72559ae7eb+5745/foo.txt), number passes in any integer, and text passes in any string.
36 1 Bryan Cosca
37 2 Bryan Cosca
The default parameter is useful for using a collection you know will most likely be used, so the user does not have to input it manually. For example, a reference genome collection that will be used throughout the entire pipeline.
38
39
The title and description parameters are useful for showing what the script parameter is doing, but is not necessary.
40
41
h3. Writing runtime_constraints
42
43
"Runtime_constraints":http://doc.arvados.org/api/schema/Job.html are inputs in your job that help choose node parameters that your pipeline will run on. Optimizing these parameters can be found in the "Pipeline_Optimization wiki.":https://dev.arvados.org/projects/arvados/wiki/Pipeline_Optimization
44
45 1 Bryan Cosca
The actual meaning of min_nodes
46
Setting max_tasks_per_node != 1