Project

General

Profile

Actions

Bug #16169

closed

tiling workflow cancelled for unknown reason

Added by Jiayong Li over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release relationship:
Auto

Description

Running tiling workflow but it gets cancelled. https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-mzrysxcgtubgva9

I tried various run time constraints and workflow parameters, but they all get cancelled.

Before su92l was upgraded, I ran a workflow of the same scale (input also around 2TB), and it was successful. https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-nm507pzmjqiai4s

Contrasting individual jobs from these two runs, https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-vdlq5f0hqldttso completed but https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-t3dtsqsi3vqfetb is cancelled.


Subtasks 1 (0 open1 closed)

Task #16187: Review 16169-cwl-hintsResolvedPeter Amstutz03/02/2020Actions
Actions #1

Updated by Jiayong Li over 4 years ago

I changed "no_listing" from "hints" to "requirements", still failed https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-jqx484v754z4vzl

Actions #2

Updated by Lucas Di Pentima over 4 years ago

  • Target version changed from To Be Groomed to 2020-02-26 Sprint
  • Assigned To set to Lucas Di Pentima
  • Status changed from New to In Progress
  • Category set to Crunch

It seems that the container is getting OOM-killed.

We're also getting a warning on the log:

Warning: cwltool: ../../lib/cwl/workflow.json:1:25668: Recursive directory listing has resulted in a large number of
                                     File objects (1733821) passed to the input parameter 'fjdir'. 
                                     This may negatively affect workflow performance and memory use.

                                     If this is a problem, use the hint
                                     'cwltool:LoadListingRequirement' with "shallow_listing" or
                                     "no_listing" to change the directory listing behavior:

                                     $namespaces:
                                       cwltool: "http://commonwl.org/cwltool#" 
                                     hints:
                                       cwltool:LoadListingRequirement:
                                         loadListing: shallow_listing

...but the workflow already has the no_listing hint from previous (pre 2.0) successful runs. Maybe this hint is being ignored?

Actions #3

Updated by Jiayong Li over 4 years ago

specifying "no_listing" on the workflow got ignored
but specifying "no_listing" on the job level works
https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-lpvofphpiwrtqan

Actions #4

Updated by Peter Amstutz over 4 years ago

  • Target version changed from 2020-02-26 Sprint to 2020-03-11 Sprint
Actions #5

Updated by Peter Amstutz over 4 years ago

  • Assigned To changed from Lucas Di Pentima to Peter Amstutz
Actions #6

Updated by Lucas Di Pentima over 4 years ago

Updates at b12b6c014f0e26fb4c2c2a5ad27a36c3685babf1 - branch 16169-cwl-hints

I was able to reproduce the bug via an a-c-r integration test, handing this off to Peter as I'm a bit stuck and it would be great to have it done for 2.0.1

Actions #7

Updated by Peter Amstutz over 4 years ago

  • Release set to 29
Actions #8

Updated by Peter Amstutz over 4 years ago

For some reason, this bug appears when the workflow is --submitted and run a container, but if run directly on the host with --local it doesn't do it.

Actions #10

Updated by Lucas Di Pentima over 4 years ago

This LGTM, thanks!

Actions #11

Updated by Peter Amstutz over 4 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF