Story #17049

Make a single chromosome run of the demo

Added by Peter Amstutz 8 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

https://github.com/arvados/arvados-tutorial

Run 5 samples on su92l, on a single chromosome.


Related issues

Related to Arvados Epics - Story #17001: Ensure Processing Whole Genome Sequences Demo Works in WB2 and Is Updated to Reflect WB2 FeaturesNew06/30/202108/31/2021

History

#1 Updated by Peter Amstutz 8 months ago

  • Related to Story #17001: Ensure Processing Whole Genome Sequences Demo Works in WB2 and Is Updated to Reflect WB2 Features added

#2 Updated by Jiayong Li 8 months ago

  • Tracker changed from Bug to Story

#3 Updated by Jiayong Li 7 months ago

  • Description updated (diff)

#4 Updated by Jiayong Li 6 months ago

  • Status changed from New to Feedback
  • Description updated (diff)

branch 17049-make-singlechrom, commit d2f6d47da38e96de4597bb58ba1193e261af8fb1

I finished a run for chr19 input fastqs for 5 samples.
https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-3dqeqodgi1usk0r

I encountered a gatk bug that the memory usage spikes when writing result vcf in the haplotypecaller step, if the region is outside of chr19. (And in general, in regions when very few reads are placed.) I had to schedule 14G ram machines for that step to complete it. (See failure https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-v0sq9csiz1p94nx)

Also I had to upload a different docker image for curii/clinvar-report to keep in order to finish the gvcf-to-vcf step (see failure https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-zulp4y3e9f7tv59). The new docker image is https://workbench.su92l.arvadosapi.com/collections/su92l-4zz18-kl40n4746vmjnw2

Input fastqs here https://workbench.su92l.arvadosapi.com/collections/su92l-4zz18-j1jotbx4uqckkec
The run for generating the chr19 input fastqs is https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-ilo79fgj728afkl

#5 Updated by Peter Amstutz 6 months ago

So the goal of this was to have a relatively lightweight pipeline that still had all the pieces of the whole genome pipeline.

I think this still needs a little work to be lightweight --

I see dozens of haplotype caller instances that are each running for 1-2 minutes. Can we consolidate that?

Same for selectvariants/basecalibrator/applyBSQR. Can we either (a) not scatter or (b) use the RunInSingleContainer feature.

#6 Updated by Jiayong Li 6 months ago

branch 17049-make-singlechrom, commit d2f6d47da38e96de4597bb58ba1193e261af8fb1

I changed the fullintervallist to reflect chr19 only calling, hopefully this run is better.
https://workbench.su92l.arvadosapi.com/container_requests/su92l-xvhdp-1l4x28f6q1rn955

#7 Updated by Jiayong Li 5 months ago

  • Status changed from Feedback to Resolved

commit d147d1d1fafeeea06bd09d9479337b0f5aab43b0

Added comments for the singe chrom chr19 yml.

Also available in: Atom PDF