Story #3898

[API] Job model has a single state attribute that can be updated and read instead of paying attention to running and success flags.

Added by Tom Clegg almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Start date:
09/19/2014
Due date:
% Done:

100%

Estimated time:
(Total: 1.00 h)
Story points:
0.5

Description

Before validation:
  • If state has changed to Running, set started_at to now
  • If state has changed to Failed or Complete, set finished_at to now
  • If state has changed to Cancelled, set cancelled_at to now
  • If state has changed at all, set running and success (see below)
  • If state has not changed, and any of {running, success, cancelled_at} have changed, update state accordingly
  • If state is nil, update state according to running, success, cancelled_at
Validation:
  • If state has changed to Running, and is_locked_by_uuid is nil, return false
Translating from running/success/cancelled_at to state:
running success cancelled_at state
any any not nil Cancelled
any false nil Failed
any true nil Complete
true nil nil Running
false nil nil Queued
Translating from state to running/success:
state running success
Queued false nil
Running true nil
Cancelled false false
Failed false false
Complete false true

Subtasks

Task #3906: Add state columnResolvedRadhika Chippada

Task #3939: review 3898-job-state-attr-fixesResolvedWard Vandewege

Task #3934: Review branch: 3898-job-state-attrResolvedPeter Amstutz

Task #3951: review 3898-job-state-attr-TCResolved

Associated revisions

Revision 3cc699f7
Added by Radhika Chippada almost 5 years ago

closes #3898
Merge branch '3898-job-state-attr'

Revision f47d3649
Added by Radhika Chippada almost 5 years ago

refs #3898
Merge branch '3898-job-state-attr'

Revision e8c3c3b8 (diff)
Added by Ward Vandewege almost 5 years ago

Extra possible job state:

not cancelled, not successful, but finished_at is set -> Failed

refs #3898

Revision 68b07b79 (diff)
Added by Ward Vandewege almost 5 years ago

Extra job state:

not started, not cancelled, not locked, not successful -> Queued

refs #3898

Revision 77a5add9 (diff)
Added by Ward Vandewege almost 5 years ago

Bugfix: do not try to overwrite docker_image_locator if it is already set.

refs #3898

Revision 9dca897d (diff)
Added by Ward Vandewege almost 5 years ago

Jobs that are not started, not cancelled, and locked should be marked Failed.

refs #3898

Revision 4a577dc5 (diff)
Added by Ward Vandewege almost 5 years ago

Jobs in any other state should be failed.

refs #3898

Revision 7e27eb2c (diff)
Added by Ward Vandewege almost 5 years ago

Bugfix: do not try to overwrite docker_image_locator if it is already set.

refs #3898

Revision d6478bb2
Added by Tom Clegg almost 5 years ago

Merge branch '3898-job-state-attr-TC' closes #3898

Revision 1fc256ce (diff)
Added by Tom Clegg almost 5 years ago

Revert "Bugfix: do not try to overwrite docker_image_locator if it is already set." refs #3898

This reverts commit 7e27eb2cca7e9e22ad1f56a6f0ecbbc40ad4cb64.

Revision 9b16ff9e (diff)
Added by Peter Amstutz almost 5 years ago

Use job state field instead of running/success columns. refs #3898

History

#1 Updated by Tom Clegg almost 5 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint

#2 Updated by Peter Amstutz almost 5 years ago

  • Assigned To set to Peter Amstutz

#3 Updated by Peter Amstutz almost 5 years ago

  • Assigned To changed from Peter Amstutz to Radhika Chippada

#4 Updated by Radhika Chippada almost 5 years ago

  • Status changed from New to In Progress
  • Allowed States:

Queued, Running, Cancelled, Failed, Complete

  • Use the following to determine state from current success and running attributes:
Running Success State
any False Failed
any True Complete
True nil Running
Queued started_at, is_locked_by_uuid, cancelled_at and success are all nil
Cancelled cancelled_at attribute is set
  • Fail migration if Job cannot enter any of these 5 states.

#5 Updated by Tom Clegg almost 5 years ago

  • Description updated (diff)

#6 Updated by Radhika Chippada almost 5 years ago

I set started_at, cancelled_at, and finished_at to Time.now only when they are not set. Peter thinks the crunch job would be setting them, and better to set them if they are not already set.

#7 Updated by Radhika Chippada almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset arvados|commit:3cc699f70a514878b7ce7adbc3c96e73f92fdf82.

#8 Updated by Ward Vandewege almost 5 years ago

  • Status changed from Resolved to In Progress

#9 Updated by Radhika Chippada almost 5 years ago

Review comments for branch: 3898-job-state-attr-fixes

LGTM. I was also initially concerned about including !is_locked_by_uuid in checking the status when state is null. This seems to have been problematic and I do not see any harm in omitting this in this check.

Thanks for addressing this.

#10 Updated by Tom Clegg almost 5 years ago

Nitpicks about this comment
  • +        # race condition for jobs that have just been grabbed by crunch-dispatch but haven't been marked as started yet...
  • It's crunch-job that does the grabbing, not crunch-dispatch
  • crunch-job sets started_at and is_locked_by_uuid in the same update transaction, so is this race really possible?

I think the bigger problem to fix here is that Job is now trying to manipulate the old state fields in interesting ways, even when clients are not using the new state field. This seems likely to create/expose new bugs in code paths that we are just about to replace, which seems a dubious use of time...

#11 Updated by Anonymous almost 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 75 to 100

Applied in changeset arvados|commit:d6478bb286c0bb1e7b8af67fb4800db792200022.

Also available in: Atom PDF