Bug #16169
closedtiling workflow cancelled for unknown reason
Description
Running tiling workflow but it gets cancelled. https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-mzrysxcgtubgva9
I tried various run time constraints and workflow parameters, but they all get cancelled.
Before su92l was upgraded, I ran a workflow of the same scale (input also around 2TB), and it was successful. https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-nm507pzmjqiai4s
Contrasting individual jobs from these two runs, https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-vdlq5f0hqldttso completed but https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-t3dtsqsi3vqfetb is cancelled.
Updated by Jiayong Li over 4 years ago
I changed "no_listing" from "hints" to "requirements", still failed https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-jqx484v754z4vzl
Updated by Lucas Di Pentima over 4 years ago
- Target version changed from To Be Groomed to 2020-02-26 Sprint
- Assigned To set to Lucas Di Pentima
- Status changed from New to In Progress
- Category set to Crunch
It seems that the container is getting OOM-killed.
We're also getting a warning on the log:
Warning: cwltool: ../../lib/cwl/workflow.json:1:25668: Recursive directory listing has resulted in a large number of File objects (1733821) passed to the input parameter 'fjdir'. This may negatively affect workflow performance and memory use. If this is a problem, use the hint 'cwltool:LoadListingRequirement' with "shallow_listing" or "no_listing" to change the directory listing behavior: $namespaces: cwltool: "http://commonwl.org/cwltool#" hints: cwltool:LoadListingRequirement: loadListing: shallow_listing
...but the workflow already has the no_listing
hint from previous (pre 2.0) successful runs. Maybe this hint is being ignored?
Updated by Jiayong Li over 4 years ago
specifying "no_listing" on the workflow got ignored
but specifying "no_listing" on the job level works
https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-lpvofphpiwrtqan
Updated by Peter Amstutz over 4 years ago
- Target version changed from 2020-02-26 Sprint to 2020-03-11 Sprint
Updated by Peter Amstutz over 4 years ago
- Assigned To changed from Lucas Di Pentima to Peter Amstutz
Updated by Lucas Di Pentima over 4 years ago
Updates at b12b6c014f0e26fb4c2c2a5ad27a36c3685babf1 - branch 16169-cwl-hints
I was able to reproduce the bug via an a-c-r
integration test, handing this off to Peter as I'm a bit stuck and it would be great to have it done for 2.0.1
Updated by Peter Amstutz over 4 years ago
For some reason, this bug appears when the workflow is --submitted and run a container, but if run directly on the host with --local it doesn't do it.
Updated by Peter Amstutz over 4 years ago
16169-cwl-hints @ 1d9e4de7a4ff994cfc7a9319dcae56bb26c272b3
Updated by Peter Amstutz over 4 years ago
- Status changed from In Progress to Resolved