Feature #3899

[Crunch] crunch-job and Workbench use the new job state attribute (see #3898) instead of paying attention to (and updating) running and success flags.

Added by Tom Clegg about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
09/24/2014
Due date:
% Done:

100%

Estimated time:
(Total: 1.00 h)
Story points:
2.0

Description

Expected changes:
  • Crunch-job reads/writes the new state attribute instead of the old success/running flags.
  • Workbench uses the state attribute provided by API, instead of deriving its own state string by comparing running/success/cancelled_at etc.
  • ditto arv-run-pipeline-instance

Subtasks

Task #3962: Update crunch-dispatchResolvedPeter Amstutz

Task #3963: Update arv-run-pipeline-instanceResolvedPeter Amstutz

Task #3982: Review 3988-crunch-use-job-stateResolvedTom Clegg

Task #3921: Update crunch-jobResolvedPeter Amstutz

Associated revisions

Revision 1b189a09
Added by Peter Amstutz about 5 years ago

Merge branch '3899-crunch-use-job-state' closes #3899

Revision e6fbedf9 (diff)
Added by Peter Amstutz about 5 years ago

3899: Fix for counting complete/failed components refs #3899

History

#1 Updated by Peter Amstutz about 5 years ago

  • Target version changed from Arvados Future Sprints to 2014-10-08 sprint

#2 Updated by Peter Amstutz about 5 years ago

  • Assigned To set to Peter Amstutz
  • Story points changed from 1.0 to 2.0

#3 Updated by Peter Amstutz about 5 years ago

Design proposal to implement this story in a way that also fixes #3859

  • Changing state from "queued" to "running" also sets is_locked_by_uuid and started_at. Fails if the starting state is not "queued".
  • Change is_locked_by_uuid to is_locked_by_token which is the API token that was issued to job instead of the user uuid. Restrict is_locked_by_token column from everybody except admins and crunch-dispatch.
  • Restrict further changes to the job record to that API token.
  • Set finished_at when state changes from "Running" to one of Complete, Cancelled, Failed

#4 Updated by Peter Amstutz about 5 years ago

"Consider what happens when running a job in local mode"

#5 Updated by Tom Clegg about 5 years ago

  • Description updated (diff)

#6 Updated by Tom Clegg about 5 years ago

Notes @ 9daebff...

In arv-run-pipeline-instance the map/reduce bits deserve to be less verbose. Suggest something like this:
  • -    ended = @components.map { |cname, c| 
    -      if c[:job] and ["Complete", "Failed", "Cancelled"].include? c[:job][:state] then 1 else 0 end 
    -    }.reduce(:+) || 0
    -
    -    succeeded = @components.map { |cname, c| 
    -      if c[:job] and ["Complete"].include? c[:job][:state] then 1 else 0 end 
    -    }.reduce(:+) || 0
    -
    -    failed = @components.map { |cname, c| 
    -      if c[:job] and ["Failed", "Cancelled"].include? c[:job][:state] then 1 else 0  end 
    -    }.reduce(:+) || 0
    
    +    c_in_state = @components.values.group_by { |c| 
    +      c[:job] and c[:job][:state]
    +    }
    +    succeeded = c_in_state["Complete"].count
    +    failed = c_in_state["Failed"].count + c_in_state["Cancelled"].count
    +    ended = succeeded + failed
    
crunch-job: ooh, nice secret bugfix:
  • -exit ($Job->{'success'} ? 1 : 0);
    +exit ($Job->{'state'} != 'Complete' ? 1 : 0);
    
crunch-job: This error message could contain misleading information if the state changes to something other than Cancelled. I think it's worth separating the expected case ("Job cancelled at ... by ...") from the unexpected case ("Job state unexpectedly changed to X. Who could work this way? I give up."), which would also avoid the misleading message.
  • -      if ($Job->{'cancelled_at'}) {
    -        Log (undef, "Job cancelled at " . $Job->{cancelled_at} .
    +      if ($Job->{'state'} ne "Running") {
    +        Log (undef, "Job state changed to " . $Job->{'state'} . " at " . $Job->{cancelled_at} .
                  " by user " . $Job->{cancelled_by_user_uuid});
    
crunch-job: I think this should test state==Cancelled instead of testing cancelled_at, and therefore shouldn't bother setting state to Cancelled:
  • +  if ($Job->{'cancelled_at'}) {
    +    $Job->update_attributes('state' => 'Cancelled',
    +                            'finished_at' => scalar gmtime);
    
Todo:
  • workbench pipeline_instances_helper.rb should use new state attr
  • workbench Job.state class method should go away (afaict only app/views/application/_job_status_label.html.erb uses this, presumably it should now use "j.state" instead of "Job.state j")

#7 Updated by Peter Amstutz about 5 years ago

Tom Clegg wrote:

Notes @ 9daebff...

In arv-run-pipeline-instance the map/reduce bits deserve to be less verbose. Suggest something like this:
  • [...]

Nice! Fixed.

crunch-job: This error message could contain misleading information if the state changes to something other than Cancelled. I think it's worth separating the expected case ("Job cancelled at ... by ...") from the unexpected case ("Job state unexpectedly changed to X. Who could work this way? I give up."), which would also avoid the misleading message.
  • [...]

Fixed.

crunch-job: I think this should test state==Cancelled instead of testing cancelled_at, and therefore shouldn't bother setting state to Cancelled:
  • [...]

Fixed.

Todo:
  • workbench pipeline_instances_helper.rb should use new state attr

Fixed.

  • workbench Job.state class method should go away (afaict only app/views/application/_job_status_label.html.erb uses this, presumably it should now use "j.state" instead of "Job.state j")

If we get rid of Job.state it breaks the one specific case where we're looking at an old pipeline instance record and for some reason the actual job object isn't available so we're looking at the unmigrated copy in 'components'. Unless you feel strongly about it, I would prefer to leave it since it's more effort to change it (also Job::state is used in several other partials).

#8 Updated by Tom Clegg about 5 years ago

Now at 2861857

Peter Amstutz wrote:

  • workbench Job.state class method should go away (afaict only app/views/application/_job_status_label.html.erb uses this, presumably it should now use "j.state" instead of "Job.state j")

If we get rid of Job.state it breaks the one specific case where we're looking at an old pipeline instance record and for some reason the actual job object isn't available so we're looking at the unmigrated copy in 'components'. Unless you feel strongly about it, I would prefer to leave it since it's more effort to change it (also Job::state is used in several other partials).

Ah, I see now. Perhaps adding a state field when needed in the pipeline helper's "make a fake job object" might be a more appropriate way to do this, though? That way Job::state j could get replaced with j[:state] everywhere, instead of having that mystery scattered throughout and expected of all future code.

Everything else LGTM, thanks!

#9 Updated by Peter Amstutz about 5 years ago

  • Status changed from New to In Progress

#10 Updated by Peter Amstutz about 5 years ago

Tom Clegg wrote:

Ah, I see now. Perhaps adding a state field when needed in the pipeline helper's "make a fake job object" might be a more appropriate way to do this, though? That way Job::state j could get replaced with j[:state] everywhere, instead of having that mystery scattered throughout and expected of all future code.

Fixed, but I can't test it because of other bugs rendering the 'job' portion of pipeline components. I think Tim is working on this so I don't want to duplicate his work.

#11 Updated by Tom Clegg about 5 years ago

Missing newline at EOF @services/api/test/fixtures/jobs.yml.

In apps/workbench/app/helpers/pipeline_instances_helper.rb I suspect we should leave pj[:job][:state] alone if it's already there.

Thanks

#12 Updated by Peter Amstutz about 5 years ago

  • Status changed from In Progress to New

Tom Clegg wrote:

Missing newline at EOF @services/api/test/fixtures/jobs.yml.

Fixed.

In apps/workbench/app/helpers/pipeline_instances_helper.rb I suspect we should leave pj[:job][:state] alone if it's already there.

Fixed.

#13 Updated by Anonymous about 5 years ago

  • Status changed from New to Resolved
  • % Done changed from 75 to 100

Applied in changeset arvados|commit:1b189a0961ba757caf6160285b59daa26c7cdcae.

Also available in: Atom PDF