Project

General

Profile

Actions

Bug #13108

closed

arvados-cwl-runner dispatches container requests very slowly

Added by Joshua Randall about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

I have observed that arvados-cwl-runner (a-c-r) seems to only be able to dispatch a container request once every 0.9s or so. We have a workflow in which at one point the workflow is scattered across both samples and genomic regions (this seems like a fairly standard way to split up work). For a variety of reasons, we are using 200 genomic intervals. At some point in our workflow we have n_samples*200 steps ready to run - for example, we are currently running a small test dataset of 147 samples against our GATK 4 pipeline, which results in 29400 invocations of HaplotypeCaller that are typically ready to run at around the same time.

My expectation would be that all 29400 container requests would be submitted within no more than a few minutes, and the subsequent containers and slurm jobs would then be scheduled, also within a few minutes.

What actually happens given the current performance characteristics of a-c-r is that it takes over 7 hours to submit all of the container requests. The result is that we cannot keep our compute nodes full of work, despite there being plenty of work to do.

Our current workaround for this is not attempt to run this workflow at all and instead to invoke a-c-r once per sample such that each has it's own RunnerContainer.


Subtasks 5 (0 open5 closed)

Task #13287: Parallel job submissionResolvedPeter Amstutz04/06/2018Actions
Task #13288: Review 13108-cwl-parallel-submitResolvedPeter Amstutz04/06/2018Actions
Task #13350: Fix Docker image uploadResolvedPeter Amstutz04/06/2018Actions
Task #13357: Review 13108-acr-threading-fixesResolvedPeter Amstutz04/06/2018Actions
Task #13375: Successful run on jenkinsResolvedPeter Amstutz04/06/2018Actions

Related issues

Related to Arvados - Bug #13351: Benchmark container_request creation and see if there are opportunities for optimizationIn ProgressPeter AmstutzActions
Actions

Also available in: Atom PDF