[CWL][Workbench] Update stats calculations for CWL running on Crunch v1
In looking at this job: https://workbench.e51c5.arvadosapi.com/pipeline_instances/e51c5-d1hrv-23r02rmoqfq5tew
I see a couple of things misleading about the times being reported:
The node scaling is reported as "1.0x" even though the CWL scattered across a large number of parallel nodes. The time from the workunit needs to be propagated up to the top level.
Work unit stats need to be calculated recursively?
PipelineInstancesHelper.determine_wallclock_runtime needs to have logic checked/updated
#1 Updated by Tom Morris over 2 years ago
- Subject changed from [CWL][Workbench] Top level job stats misleading to [CWL][Workbench] Update stats calculations for CWL running on Crunch v1
- Description updated (diff)
- Assigned To deleted (
- Target version set to Arvados Future Sprints
- Story points set to 1.0
#6 Updated by Radhika Chippada over 2 years ago
I do not have access to this cluster, but searching on qr1hi I found the following:
It appears that here the timing stats are calculated correctly.
Can you please add an image of the pipeline and it's json (you can get it from the Advanced tab) to the ticket so that I can debug. If this doesn't help, I will ask Nico for access to this env. Thanks.
#7 Updated by Radhika Chippada over 2 years ago
I noticed one issue with finished_at time while debugging this issue. In #10671 we addressed the scenario where finished_at was set on the server if the client did not set it, when a pipeline is finished. However, we still have missing finished_at on pipelines that were already completed prior to this fix.
Discussed with Tom and we agreed to add a migration script to set the finished_at time to the pipeline's modified_at time (which must have been set at the time of completion or later on if something else changed on it).
#10 Updated by Radhika Chippada over 2 years ago
Branch 10516-workbench-stats-logic @ b38aee84aea043eb7bcb3acbab0a8ef64edf0838
Addresses the scaling factor error.
The image "before-fix" is what is there in production now, and "with-fix" is what we see after the scaling factor correction.
Tests @ https://ci.curoverse.com/job/developer-run-tests/135/ (I fixed the one failing unit test after this run).
#13 Updated by Radhika Chippada over 2 years ago
Updated the unit/work_unit_test.rb to compare cputime and walltime.