Project

General

Profile

Actions

Bug #5490

closed

[Crunch] Job should not be reused when docker tag _or_ arvados sdk version now resolves to a different image/commit

Added by Abram Connelly over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0

Description

I updated the 'arvados_sdk_version' in my pipeline and ran the pipeline. The first two jobs in the pipeline got re-used even though the more recent version of 'arvados_sdk_version' was specified.

Pipeline instance tb05z-d1hrv-kkqz083qq34l4aa . The pipeline has three jobs, all of which had the 'runtime_constraints.arvados_sdk_version' updated to be 'd8abf0b06cc38a1e1030e04966d0cb616eae8c2e'. The third job in the pipeline had a more recent 'script_version' so it got re-run properly, but the first two jobs had only the 'arvados_sdk_version' updated and were re-used.


Subtasks 1 (0 open1 closed)

Task #5784: Review 5490-crunch-tighten-job-reuse-wipResolvedBrett Smith04/21/2015Actions

Related issues

Related to Arvados - Bug #5388: [Crunch] Jobs do not re-run with latest docker image if job is unchanged but docker image is updatedResolvedBrett Smith03/05/2015Actions
Related to Arvados - Bug #5688: [API] Accept a git revision range in runtime_constraints, deprecate "min" and "max" revision filters.ResolvedActions
Actions #1

Updated by Peter Amstutz over 9 years ago

  • Tracker changed from Bug to Task
  • Target version changed from Bug Triage to 2015-04-29 sprint
Actions #2

Updated by Peter Amstutz over 9 years ago

  • Tracker changed from Task to Bug
Actions #3

Updated by Tom Clegg over 9 years ago

  • Subject changed from Updating 'arvados_sdk_version' reuses job when it shouldn't to [Crunch] Job should not be reused when docker tag _or_ arvados sdk version now resolves to a different image/commit
Actions #4

Updated by Tom Clegg over 9 years ago

  • Story points set to 1.0
Actions #5

Updated by Brett Smith over 9 years ago

  • Assigned To set to Brett Smith
Actions #6

Updated by Brett Smith over 9 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Brett Smith over 9 years ago

tb05z-d1hrv-zm8v47frcn1mqgn might be more illustrative: it reused jobs with arvados_sdk_version d8abf0b06cc38a1e1030e04966d0cb616eae8c2e, even though the components clearly specify 2507d47331437166d89f6147faaf8aaabb965a94, which is unambiguously newer (check the arvados repository git log).

Actions #8

Updated by Brett Smith over 9 years ago

The example above stems from the way we interpret git version ranges. Here's the git rev-parse call generated by the Commit model, and sure enough, it includes the older commit:

brinstar % git rev-list --max-count=1 HEAD --
397191893819083925600a61e2f355a3b6513354
brinstar % git rev-list 2507d47331437166d89f6147faaf8aaabb965a94..397191893819083925600a61e2f355a3b6513354 -- | grep d8abf0b06cc38a1e1030e04966d0cb616eae8c2e
d8abf0b06cc38a1e1030e04966d0cb616eae8c2e
Actions #9

Updated by Brett Smith over 9 years ago

tb05z-d1hrv-33xt7slr5xvje5a is a copy of the same pipeline instance that specifies arvados_sdk_version 18148ec24bf63ec3353a2cd38f6f07c460037db2. That version also comes after d8abf0b06cc38a1e1030e04966d0cb616eae8c2e, but the latter is not in range from the former:

brinstar % git rev-parse 18148ec24bf63ec3353a2cd38f6f07c460037db2..HEAD | grep d8abf0b06cc38a1e1030e04966d0cb616eae8c2e
[nothing]

This pipeline instance is not reusing jobs, but running from scratch.

Right now I'm inclined to chalk up the original bug to a combination of odd pipeline specifications, and a relatively involved git history in the arvados repository that's difficult to reason about linearly.

Actions #10

Updated by Brett Smith over 9 years ago

From IRC:

(11:54:19 AM) Me: Instead, I think it's just one of those things where the API server uses much more precise git semantics than most users are accustomed to.
(11:55:04 AM) tomclegg: brett: I'm thinking: "related to #5688"
(11:55:12 AM) Me: tomclegg: Me too, that's why I flagged it for you.
(11:55:31 AM) Me: Although even then, I'm not sure how well users would take this.
(11:55:42 AM) tetron_: maybe it should be using --first-parent?
(11:55:51 AM) Me: As long as we boil down to a git range at some point, I think the way git ranges traverse the graph will be surprising to some users some of the time.
(11:56:21 AM) tomclegg: right. I think one important advantage of using real git ranges is that you can test them yourself on your repository to find out what they resolve to.
(11:57:20 AM) Me: So are you reasonably satisfied there's no code change to make here before 5688?
(11:58:06 AM) tomclegg: ...that, plus giving users good examples ("use a range like X in a situation like Y"), should hopefully go a long way
(11:58:23 AM) Me: So you're saying 5490 is now partly a documentation story. :)
(11:58:33 AM) tomclegg: brett: yes, I think the only thing worth doing here would be documentation.
(11:58:40 AM) Me: Because for arvados_sdk_version specifically, users can't specify a range as such.
(11:59:12 AM) Me: They can only specify the version they want, and then Arvados will look for jobs using an SDK between that version and HEAD (yes, yes, I know) in the arvados repository.

Actions #11

Updated by Brett Smith over 9 years ago

This bug and #3341 have the same basic underlying issue: when you don't specify your own filters for these runtime constraints, the API server's jobs controller searches on ranges based on your input, and will reuse any job within the found range. For arvados_sdk_version, the range is "any git commit between the specified version and HEAD in the arvados repository." For docker_image, the range is "any Docker image with a matching name (all of repository, name, and tag, whatever's specified)."

All these reports make it clear that this is not the default behavior our users want. Instead, the default filters should act more like the behavior users are used to around a job's script_version:

  • If the name is symbolic (a git ref or Docker image name), find the latest SDK version or Docker image with that name, and only consider reusing jobs that used that.
  • If the name is a content address, only consider reusing jobs that used that exact SDK version or Docker image.

These rules are relatively straightforward to implement, complicated only by the fact that you can't assume something that looks like a content address actually is. For example:

brinstar % git init
Initialized empty Git repository in /home/brett/temp/shabranch/.git/
brinstar % git commit --allow-empty -m 'initial commit'
[master (root-commit) c02446d] initial commit
brinstar % git log
commit c02446dae863dc49784b015758a85ec3c72ef04c
Author: Brett Smith <brett@curoverse.com>
Date:   Thu Apr 16 15:28:49 2015 -0400

    initial commit
brinstar % git checkout -b c02446dae863dc49784b015758a85ec3c72ef04c
Switched to a new branch 'c02446dae863dc49784b015758a85ec3c72ef04c'
brinstar % git checkout master
Switched to branch 'master'
brinstar % git checkout c02446dae863dc49784b015758a85ec3c72ef04c
Switched to branch 'c02446dae863dc49784b015758a85ec3c72ef04c'

(Note the last line checks out a branch, not a commit in detached HEAD state.)

For both these searches, we should try to search by name, and then try by content address if that doesn't yield anything. This would be most consistent with git's own rules.

Actions #12

Updated by Brett Smith over 9 years ago

More design from IRC:

(03:36:45 PM) Me: So maybe the right thing to do is to always interpret hashes as commits over branches when ambiguous, and let people specify heads/<hash> to disambiguate?
(03:37:25 PM) Me: That is internally consistent, probably less surprising in the long run, and doesn't make anything unaddressable.
(03:37:27 PM) tetron_: yes
(03:39:52 PM) tomclegg: well... that sounds like it breaks pjotr's workflow, fwiw
(03:40:10 PM) Me: tomclegg: Howso?
(03:40:25 PM) tomclegg: I still think if I name a branch "bad" then "bad" should mean my branch, even if a commit starts with "bad"
(03:40:39 PM) tomclegg: s/commit/commit sha1/
(03:41:20 PM) tomclegg: If you define "hashes" as full 40-char sha1 then I agree with your "right thing to do" though
(03:41:32 PM) Me: Yeah, that's what I meant.
(03:41:35 PM) tomclegg: oh ok
(03:41:38 PM) tomclegg: nm then, carry on
(03:41:40 PM) Me: Sorry.
(03:41:52 PM) Me: I agree that if a short name matches a branch name that's a better match than any commit expansion.

Actions #13

Updated by Radhika Chippada about 9 years ago

Brett: LGTM. I could not find any issues to comment. Thanks.

Actions #14

Updated by Brett Smith about 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:03d89302f14a4a1c448b32d62679bab14e23fa23.

Actions

Also available in: Atom PDF