[Documentation] Best practices for getting the most out of job re-use
Not sure how to name this, but I ran into the situation where:
I have 10 jobs running in a pipeline, and 9 complete and 1 fail. Each job calls a unique script in my git repo. I've set the script_version to master on each job so that I wouldn't have to change the script version every time I make a commit. I edit that one script where the job fails and make a commit/push. I then use re-run with latest so arvados would account for my new repo. All 10 jobs re-run, which is not intended because I only wanted the one that failed to re-run to run.
It would be nice to have the pipeline-runner to be able to see if the scripts that have successfully completed a job have changed before re-running the job.
#2 Updated by Tom Clegg almost 5 years ago
- Category set to Documentation
I suspect it's not safe to implement this feature exactly as requested ("re-use if the script hasn't changed, even though the branch has changed"): predicting whether a given (changed) file will be loaded as a dependency by the (unchanged) script is intractable.However, there is surely a development cycle that reuses the right jobs in this common scenario. Possibilities:
- Instead of specifying "master" as the only acceptable version, specify a range like "fix-bug-X^..master"
- Write the pipeline using a different branch for each pipeline component. Commit fixes to master, but don't move the componentX branch unless you want componentX jobs to re-run. When the pipeline runs to completion, move all component branches up to master and re-run to make sure every component still works at master. Then change the template to use the same version ("master" or "stable" or whatever) for all components.