[CWL] Sometimes deadlocks on completion of subworkflow
This is a ticket to follow up on our discussion about deadlocks during bcbio CWL runs. These are intermittent and thus hard to provide a test case for but the behavior is that the run will report on a deadlock with all successfully completed tasks. Restarting a workflow with reuse enabled picks up where it left off and will keep running to completion. The deadlock occurs in multiple places although is most often seen during alignment and variant calling in steps with multiple points of parallelization:
[step vc_output_record] completion status is success [workflow variantcall] outdir is /tmp/user/1001/tmpBDstpi [step variantcall] completion status is success 2016-10-18 07:35:42 arvados.cwl-runner ERROR: Workflow is deadlocked, no runnable jobs and not waiting on any pending jobs. Workflow error, try again with --debug for more information: Workflow failed.
It would be useful if we had better reporting so we could provide useful feedback during failures that would help with isolating the problem.