Project

General

Profile

Git strategy for pipeline development » History » Version 2

Tom Clegg, 02/17/2016 11:47 PM

1 1 Tom Clegg
h1. Git strategy for pipeline development
2
3
The following scenario is common:
4
* You have a project that involves one or two pipelines
5
* Each pipeline have many components
6
* These pipelines and components make use of a common code base
7
8
Example:
9
10
|crunch scripts in repo|pipeline A|pipeline B|
11
|<pre>crunch_scripts/align
12
crunch_scripts/call
13
crunch_scripts/compare</pre>|<pre>align(1)    align(2)
14
   |           |
15
call(1)     call(2)
16
   |____   ____|
17
        | |
18
      compare</pre>|<pre>align(1)    align(2)    align(3)
19
   |           |           |
20
call(1)     call(2)     call(3)
21
   |_________  |  _________|
22
             | | |
23
            compare</pre>|
24
25
While developing the code you can expect to have moments like these:
26
* Fix a bug in @compare@ that was making it fail when given 3 inputs.
27 2 Tom Clegg
* Fix the code in commit "C", push, and re-run. (Note: results from previous runs _that succeeded_ are still valid.)
28 1 Tom Clegg
* Find a bug in @compare@ that was making it produce incorrect output when given 3 inputs.
29 2 Tom Clegg
* Fix the code in commit "E", push, and re-run. Update pipeline template B to prevent the broken jobs (the ones that are marked "success" but produced incorrect outputs) from being re-used in future pipeline runs. (Note: results from previous jobs from pipeline A are still OK.)
30 1 Tom Clegg
31
h2. A strategy
32 2 Tom Clegg
33
The *master branch* has the latest stable version of everything.
34
35
For each component of each template, *tag* the oldest acceptable version.
36
37
<pre>
38
tags →  pipelineA-compare   pipelineB-compare
39
                        ↓                   ↓
40
commits →               A----B----C----D----E----F
41
42
branches →                                       master
43
</pre>
44
45
Tell your pipeline (or other job creation script) to
46
* re-use existing jobs as long as they use a version newer than pipelineB-compare
47
* use "master" if no existing job is suitable
48
49
Or, you might use tags like
50
* "compare" -- just use this tag for all pipelines that use the "compare" script. When you find a bug that produced bad output, just move the tag, and all pipelines will stop using the buggy code.
51
* "compare-3way-bugfix" -- tag each bugfix, and add them to the pipelines where the bug could be a problem. Of course this presumes you'd rather trust yourself to keep track of which pipelines need which bugfixes than waste resources re-generating perfectly good outputs.
52
* "ok" -- tag the whole repo, and use this as the earliest acceptable revision for all jobs/components. This is safer: if you fix library code in file C in order to fix job A, and forget that script B also uses code from file C, everything is fine because all jobs that used anything in the old repo will be ineligible for reuse.
53
54
The way you specify the a range of acceptable revisions is a bit weird, but here it is:
55
56
<pre><code class="sh">
57
arv job create --job '{
58
 "repository":"username/reponame",
59
 "script":"compare",
60
 "script_version":"master",
61
 "script_parameters":{
62
  "foo":"bar"
63
 }
64
}' --filters '[
65
 [
66
  "repository",
67
  "=",
68
  "username/reponame"
69
 ],
70
 [
71
  "script",
72
  "=",
73
  "compare"
74
 ],
75
 [
76
  "script_version",
77
  "in git",
78
  "pipelineB-compare"
79
 ]
80
]'
81
</code></pre>
82
83
In a pipeline template:
84
85
<pre><code class="javascript">
86
"components":{
87
 "compare":{
88
  "repository":"username/reponame",
89
  "script":"compare",
90
  "script_version":"master",
91
  "script_parameters":{
92
   "foo":"bar"
93
  },
94
  "filters":[
95
   [
96
    "repository",
97
    "=",
98
    "username/reponame"
99
   ],
100
   [
101
    "script",
102
    "=",
103
    "compare"
104
   ],
105
   [
106
    "script_version",
107
    "in git",
108
    "pipelineB-compare"
109
   ]
110
  ]
111
 }
112
}
113
</code></pre>