Bug #10516
closed[CWL][Workbench] Update stats calculations for CWL running on Crunch v1
Description
In looking at this job: https://workbench.e51c5.arvadosapi.com/pipeline_instances/e51c5-d1hrv-23r02rmoqfq5tew
I see a couple of things misleading about the times being reported:
The node scaling is reported as "1.0x" even though the CWL scattered across a large number of parallel nodes. The time from the workunit needs to be propagated up to the top level.
Work unit stats need to be calculated recursively?
PipelineInstancesHelper.determine_wallclock_runtime needs to have logic checked/updated
Files
Updated by Tom Morris about 8 years ago
- Subject changed from [CWL][Workbench] Top level job stats misleading to [CWL][Workbench] Update stats calculations for CWL running on Crunch v1
- Description updated (diff)
- Assigned To deleted (
Tom Morris) - Target version set to Arvados Future Sprints
- Story points set to 1.0
Updated by Tom Morris about 8 years ago
- Target version changed from Arvados Future Sprints to 2017-01-04 sprint
Updated by Ward Vandewege about 8 years ago
- Target version changed from 2017-01-04 sprint to Arvados Future Sprints
Updated by Tom Morris almost 8 years ago
- Assigned To set to Radhika Chippada
- Target version changed from Arvados Future Sprints to 2017-01-18 sprint
Updated by Radhika Chippada almost 8 years ago
Tom M:
I do not have access to this cluster, but searching on qr1hi I found the following:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-mipcvxo8wbz7b6l#Components - 1.1 times scaling
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-dyze3ur68v1dy0l - 2.4 times scaling
It appears that here the timing stats are calculated correctly.
Can you please add an image of the pipeline and it's json (you can get it from the Advanced tab) to the ticket so that I can debug. If this doesn't help, I will ask Nico for access to this env. Thanks.
Updated by Radhika Chippada almost 8 years ago
I noticed one issue with finished_at time while debugging this issue. In #10671 we addressed the scenario where finished_at was set on the server if the client did not set it, when a pipeline is finished. However, we still have missing finished_at on pipelines that were already completed prior to this fix.
Discussed with Tom and we agreed to add a migration script to set the finished_at time to the pipeline's modified_at time (which must have been set at the time of completion or later on if something else changed on it).
Updated by Radhika Chippada almost 8 years ago
Branch 10516-set-finished-at-on-finished-pipelines @ ccb95499 addresses note 7 (sets the finished_at timestamp when it is missing on a finished pipeline instance).
Updated by Lucas Di Pentima almost 8 years ago
Branch 10516-set-finished-at-on-finished-pipelines
LGTM
Updated by Radhika Chippada almost 8 years ago
Branch 10516-workbench-stats-logic @ b38aee84aea043eb7bcb3acbab0a8ef64edf0838
Addresses the scaling factor error.
The image "before-fix" is what is there in production now, and "with-fix" is what we see after the scaling factor correction.
Tests @ https://ci.curoverse.com/job/developer-run-tests/135/ (I fixed the one failing unit test after this run).
Updated by Radhika Chippada almost 8 years ago
- File before-fix.png before-fix.png added
- File with-fix.png with-fix.png added
Updated by Lucas Di Pentima almost 8 years ago
Reviewing branch 10516-workbench-stats-logic
:
Updates LGTM, but maybe an additional test case is needed, to test when cputime > walltime? That is, a test that should have alerted us of this bug earlier?
Updated by Radhika Chippada almost 8 years ago
@ 6ac8cc38
Updated the unit/work_unit_test.rb to compare cputime and walltime.
Tests @ https://ci.curoverse.com/job/developer-run-tests/137/
Updated by Radhika Chippada almost 8 years ago
- Status changed from New to Resolved
Applied in changeset arvados|commit:554fe927169e928d91c2d8c4bed158aef4d4d746.