Project

General

Profile

Actions

Idea #17049

closed

Make a single chromosome run of the demo

Added by Peter Amstutz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Jiayong Li
Category:
-
Target version:
-
Start date:
Due date:
Story points:
-

Description

https://github.com/arvados/arvados-tutorial

Run 5 samples on su92l, on a single chromosome.


Related issues

Related to Arvados Epics - Idea #17001: Arvados uses WB2 by defaultResolvedActions
Actions #1

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #17001: Arvados uses WB2 by default added
Actions #2

Updated by Jiayong Li over 3 years ago

  • Tracker changed from Bug to Idea
Actions #3

Updated by Jiayong Li over 3 years ago

  • Description updated (diff)
Actions #4

Updated by Jiayong Li over 3 years ago

  • Status changed from New to Feedback
  • Description updated (diff)

branch 17049-make-singlechrom, commit d2f6d47da38e96de4597bb58ba1193e261af8fb1

I finished a run for chr19 input fastqs for 5 samples.
https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-3dqeqodgi1usk0r

I encountered a gatk bug that the memory usage spikes when writing result vcf in the haplotypecaller step, if the region is outside of chr19. (And in general, in regions when very few reads are placed.) I had to schedule 14G ram machines for that step to complete it. (See failure https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-v0sq9csiz1p94nx)

Also I had to upload a different docker image for curii/clinvar-report to keep in order to finish the gvcf-to-vcf step (see failure https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-zulp4y3e9f7tv59). The new docker image is https://workbench.su92l.arvadosapi.com/collections/su92l-4zz18-kl40n4746vmjnw2

Input fastqs here https://workbench.su92l.arvadosapi.com/collections/su92l-4zz18-j1jotbx4uqckkec
The run for generating the chr19 input fastqs is https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-ilo79fgj728afkl

Actions #5

Updated by Peter Amstutz over 3 years ago

So the goal of this was to have a relatively lightweight pipeline that still had all the pieces of the whole genome pipeline.

I think this still needs a little work to be lightweight --

I see dozens of haplotype caller instances that are each running for 1-2 minutes. Can we consolidate that?

Same for selectvariants/basecalibrator/applyBSQR. Can we either (a) not scatter or (b) use the RunInSingleContainer feature.

Actions #6

Updated by Jiayong Li over 3 years ago

branch 17049-make-singlechrom, commit d2f6d47da38e96de4597bb58ba1193e261af8fb1

I changed the fullintervallist to reflect chr19 only calling, hopefully this run is better.
https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-1l4x28f6q1rn955

Actions #7

Updated by Jiayong Li over 3 years ago

  • Status changed from Feedback to Resolved

commit d147d1d1fafeeea06bd09d9479337b0f5aab43b0

Added comments for the singe chrom chr19 yml.

Actions

Also available in: Atom PDF