Project

General

Profile

Actions

Bug #10586

closed

Python keep client (CollectionWriter) appears to deadlock

Added by Joshua Randall over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
-

Description

Some of the jobs across our cluster (those that are neither stuck due to #10585 nor one of the handful that are still running) appear to be stuck in our python crunch script in one of the calls to arvados.CollectionWriter()

Our crunch script is stuck after printing "...writing output to keep" but before "...validating it", which means it is in one of these three calls: https://github.com/wtsi-hgi/arvados-pipelines/blob/master/crunch_scripts/gatk-haplotypecaller-cram.py#L73-L80

It seems likely that issue 10585 and this one could be due to the same underlying issue, which would be some sort of deadlock in the Python keep client, assuming that arv-mount has some supervisor process that eventually notices things are hung and kills them off (whereas our crunch script doesn't have that).


Subtasks 1 (0 open1 closed)

Task #10617: Review 10586-writer-pool-deadlockResolvedTom Clegg11/22/2016Actions

Related issues

Related to Arvados - Bug #10585: crunch doesn't end jobs when their arv-mount diesResolvedTom Clegg11/22/2016Actions
Actions

Also available in: Atom PDF