Actions
Git strategy for pipeline development¶
The following scenario is common:- You have a project that involves one or two pipelines
- Each pipeline have many components
- These pipelines and components make use of a common code base
Example:
crunch scripts in repo | pipeline A | pipeline B |
crunch_scripts/align crunch_scripts/call crunch_scripts/compare |
align(1) align(2) | | call(1) call(2) |____ ____| | | compare |
align(1) align(2) align(3) | | | call(1) call(2) call(3) |_________ | _________| | | | compare |
- Fix a bug in
compare
that was making it fail when given 3 inputs. - Fix the code in commit "C", push, and re-run. (Note: results from previous runs that succeeded are still valid.)
- Find a bug in
compare
that was making it produce incorrect output when given 3 inputs. - Fix the code in commit "E", push, and re-run. Update pipeline template B to prevent the broken jobs (the ones that are marked "success" but produced incorrect outputs) from being re-used in future pipeline runs. (Note: results from previous jobs from pipeline A are still OK.)
A strategy¶
The master branch has the latest stable version of everything.
For each component of each template, tag the oldest acceptable version.
tags → pipelineA-compare pipelineB-compare ↓ ↓ commits → A----B----C----D----E----F ↑ branches → masterTell your pipeline (or other job creation script) to
- re-use existing jobs as long as they use a version newer than pipelineB-compare
- use "master" if no existing job is suitable
- "compare" -- just use this tag for all pipelines that use the "compare" script. When you find a bug that produced bad output, just move the tag, and all pipelines will stop using the buggy code.
- "compare-3way-bugfix" -- tag each bugfix, and add them to the pipelines where the bug could be a problem. Of course this presumes you'd rather trust yourself to keep track of which pipelines need which bugfixes than waste resources re-generating perfectly good outputs.
- "ok" -- tag the whole repo, and use this as the earliest acceptable revision for all jobs/components. This is safer: if you fix library code in file C in order to fix job A, and forget that script B also uses code from file C, everything is fine because all jobs that used anything in the old repo will be ineligible for reuse.
The way you specify the a range of acceptable revisions is a bit weird, but here it is:
arv job create --job '{
"repository":"username/reponame",
"script":"compare",
"script_version":"master",
"script_parameters":{
"foo":"bar"
}
}' --filters '[
[
"repository",
"=",
"username/reponame"
],
[
"script",
"=",
"compare"
],
[
"script_version",
"in git",
"pipelineB-compare"
]
]'
In a pipeline template:
"components":{
"compare":{
"repository":"username/reponame",
"script":"compare",
"script_version":"master",
"script_parameters":{
"foo":"bar"
},
"filters":[
[
"repository",
"=",
"username/reponame"
],
[
"script",
"=",
"compare"
],
[
"script_version",
"in git",
"pipelineB-compare"
]
]
}
}
Updated by Tom Clegg over 8 years ago · 2 revisions