Project

General

Profile

Actions

Bug #10516

closed

[CWL][Workbench] Update stats calculations for CWL running on Crunch v1

Added by Tom Morris about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Radhika Chippada
Category:
-
Target version:
Story points:
1.0

Description

In looking at this job: https://workbench.e51c5.arvadosapi.com/pipeline_instances/e51c5-d1hrv-23r02rmoqfq5tew
I see a couple of things misleading about the times being reported:

The node scaling is reported as "1.0x" even though the CWL scattered across a large number of parallel nodes. The time from the workunit needs to be propagated up to the top level.

Work unit stats need to be calculated recursively?

PipelineInstancesHelper.determine_wallclock_runtime needs to have logic checked/updated


Files

before-fix.png (35.7 KB) before-fix.png Radhika Chippada, 01/13/2017 12:38 AM
with-fix.png (36.1 KB) with-fix.png Radhika Chippada, 01/13/2017 12:38 AM

Subtasks 2 (0 open2 closed)

Task #10819: Review 10516-set-finished-at-on-finished-pipelinesResolvedRadhika Chippada01/04/2017Actions
Task #10879: Review branch 10516-workbench-stats-logicResolvedLucas Di Pentima01/13/2017Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #10671: [API Server] Fill in finish time when pipeline instance is marked with final stateResolvedLucas Di Pentima12/06/2016Actions
Actions #1

Updated by Tom Morris about 8 years ago

  • Subject changed from [CWL][Workbench] Top level job stats misleading to [CWL][Workbench] Update stats calculations for CWL running on Crunch v1
  • Description updated (diff)
  • Assigned To deleted (Tom Morris)
  • Target version set to Arvados Future Sprints
  • Story points set to 1.0
Actions #2

Updated by Tom Morris about 8 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Morris about 8 years ago

  • Target version changed from Arvados Future Sprints to 2017-01-04 sprint
Actions #4

Updated by Ward Vandewege about 8 years ago

  • Target version changed from 2017-01-04 sprint to Arvados Future Sprints
Actions #5

Updated by Tom Morris almost 8 years ago

  • Assigned To set to Radhika Chippada
  • Target version changed from Arvados Future Sprints to 2017-01-18 sprint
Actions #6

Updated by Radhika Chippada almost 8 years ago

Tom M:

I do not have access to this cluster, but searching on qr1hi I found the following:

https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-mipcvxo8wbz7b6l#Components - 1.1 times scaling

https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-dyze3ur68v1dy0l - 2.4 times scaling

It appears that here the timing stats are calculated correctly.

Can you please add an image of the pipeline and it's json (you can get it from the Advanced tab) to the ticket so that I can debug. If this doesn't help, I will ask Nico for access to this env. Thanks.

Actions #7

Updated by Radhika Chippada almost 8 years ago

I noticed one issue with finished_at time while debugging this issue. In #10671 we addressed the scenario where finished_at was set on the server if the client did not set it, when a pipeline is finished. However, we still have missing finished_at on pipelines that were already completed prior to this fix.

Discussed with Tom and we agreed to add a migration script to set the finished_at time to the pipeline's modified_at time (which must have been set at the time of completion or later on if something else changed on it).

Actions #8

Updated by Radhika Chippada almost 8 years ago

Branch 10516-set-finished-at-on-finished-pipelines @ ccb95499 addresses note 7 (sets the finished_at timestamp when it is missing on a finished pipeline instance).

Actions #9

Updated by Lucas Di Pentima almost 8 years ago

Branch 10516-set-finished-at-on-finished-pipelines LGTM

Actions #10

Updated by Radhika Chippada almost 8 years ago

Branch 10516-workbench-stats-logic @ b38aee84aea043eb7bcb3acbab0a8ef64edf0838

Addresses the scaling factor error.

The image "before-fix" is what is there in production now, and "with-fix" is what we see after the scaling factor correction.

Tests @ https://ci.curoverse.com/job/developer-run-tests/135/ (I fixed the one failing unit test after this run).

Updated by Radhika Chippada almost 8 years ago

Actions #12

Updated by Lucas Di Pentima almost 8 years ago

Reviewing branch 10516-workbench-stats-logic:

Updates LGTM, but maybe an additional test case is needed, to test when cputime > walltime? That is, a test that should have alerted us of this bug earlier?

Actions #13

Updated by Radhika Chippada almost 8 years ago

@ 6ac8cc38

Updated the unit/work_unit_test.rb to compare cputime and walltime.

Tests @ https://ci.curoverse.com/job/developer-run-tests/137/

Actions #14

Updated by Lucas Di Pentima almost 8 years ago

LGTM, please merge.

Actions #15

Updated by Radhika Chippada almost 8 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:554fe927169e928d91c2d8c4bed158aef4d4d746.

Actions

Also available in: Atom PDF