Bug #10516

[CWL][Workbench] Update stats calculations for CWL running on Crunch v1

Added by Tom Morris over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Start date:
01/04/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
1.0

Description

In looking at this job: https://workbench.e51c5.arvadosapi.com/pipeline_instances/e51c5-d1hrv-23r02rmoqfq5tew
I see a couple of things misleading about the times being reported:

The node scaling is reported as "1.0x" even though the CWL scattered across a large number of parallel nodes. The time from the workunit needs to be propagated up to the top level.

Work unit stats need to be calculated recursively?

PipelineInstancesHelper.determine_wallclock_runtime needs to have logic checked/updated

before-fix.png (35.7 KB) before-fix.png Radhika Chippada, 01/13/2017 12:38 AM
with-fix.png (36.1 KB) with-fix.png Radhika Chippada, 01/13/2017 12:38 AM

Subtasks

Task #10819: Review 10516-set-finished-at-on-finished-pipelinesResolvedRadhika Chippada

Task #10879: Review branch 10516-workbench-stats-logicResolvedLucas Di Pentima


Related issues

Related to Arvados - Bug #10671: [API Server] Fill in finish time when pipeline instance is marked with final stateResolved2016-12-06

Associated revisions

Revision 1d19121f
Added by Radhika Chippada about 2 years ago

refs #10516
Merge branch '10516-set-finished-at-on-finished-pipelines'

Revision 554fe927
Added by Radhika Chippada about 2 years ago

closes #10516
Merge branch '10516-workbench-stats-logic'

History

#1 Updated by Tom Morris over 2 years ago

  • Subject changed from [CWL][Workbench] Top level job stats misleading to [CWL][Workbench] Update stats calculations for CWL running on Crunch v1
  • Description updated (diff)
  • Assigned To deleted (Tom Morris)
  • Target version set to Arvados Future Sprints
  • Story points set to 1.0

#2 Updated by Tom Morris over 2 years ago

  • Description updated (diff)

#3 Updated by Tom Morris over 2 years ago

  • Target version changed from Arvados Future Sprints to 2017-01-04 sprint

#4 Updated by Ward Vandewege over 2 years ago

  • Target version changed from 2017-01-04 sprint to Arvados Future Sprints

#5 Updated by Tom Morris about 2 years ago

  • Assigned To set to Radhika Chippada
  • Target version changed from Arvados Future Sprints to 2017-01-18 sprint

#6 Updated by Radhika Chippada about 2 years ago

Tom M:

I do not have access to this cluster, but searching on qr1hi I found the following:

https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-mipcvxo8wbz7b6l#Components - 1.1 times scaling

https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-dyze3ur68v1dy0l - 2.4 times scaling

It appears that here the timing stats are calculated correctly.

Can you please add an image of the pipeline and it's json (you can get it from the Advanced tab) to the ticket so that I can debug. If this doesn't help, I will ask Nico for access to this env. Thanks.

#7 Updated by Radhika Chippada about 2 years ago

I noticed one issue with finished_at time while debugging this issue. In #10671 we addressed the scenario where finished_at was set on the server if the client did not set it, when a pipeline is finished. However, we still have missing finished_at on pipelines that were already completed prior to this fix.

Discussed with Tom and we agreed to add a migration script to set the finished_at time to the pipeline's modified_at time (which must have been set at the time of completion or later on if something else changed on it).

#8 Updated by Radhika Chippada about 2 years ago

Branch 10516-set-finished-at-on-finished-pipelines @ ccb95499 addresses note 7 (sets the finished_at timestamp when it is missing on a finished pipeline instance).

#9 Updated by Lucas Di Pentima about 2 years ago

Branch 10516-set-finished-at-on-finished-pipelines LGTM

#10 Updated by Radhika Chippada about 2 years ago

Branch 10516-workbench-stats-logic @ b38aee84aea043eb7bcb3acbab0a8ef64edf0838

Addresses the scaling factor error.

The image "before-fix" is what is there in production now, and "with-fix" is what we see after the scaling factor correction.

Tests @ https://ci.curoverse.com/job/developer-run-tests/135/ (I fixed the one failing unit test after this run).

#11 Updated by Radhika Chippada about 2 years ago

#12 Updated by Lucas Di Pentima about 2 years ago

Reviewing branch 10516-workbench-stats-logic:

Updates LGTM, but maybe an additional test case is needed, to test when cputime > walltime? That is, a test that should have alerted us of this bug earlier?

#13 Updated by Radhika Chippada about 2 years ago

@ 6ac8cc38

Updated the unit/work_unit_test.rb to compare cputime and walltime.

Tests @ https://ci.curoverse.com/job/developer-run-tests/137/

#14 Updated by Lucas Di Pentima about 2 years ago

LGTM, please merge.

#15 Updated by Radhika Chippada about 2 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:554fe927169e928d91c2d8c4bed158aef4d4d746.

Also available in: Atom PDF