Task #13334

Story #13216: Write phasing imputation workflow

Task #13300: Write phasing workflow with beagle 4.1

Review

Added by Jiayong Li over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Start date:
04/05/2018
Due date:
04/10/2018
% Done:

100%

Estimated time:
8.00 h

Description

Reviewing branch 13216-phasing-imputation-workflow on l7g-ml
with workflows in project su92l-j7d0g-049ipwfxdg21tun
Specifically the following runs:

History

#1 Updated by Keldin Sergheyev over 1 year ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • Start date set to 04/05/2018

Beginning questions:

  • Do commands look right/use appropriate flags and resources?
  • Do logs show any errors?
  • Does output look correct? (Cannot "grade" accuracy)
  • Is output meaningful?
  • Open-ended: How were errors I encountered overcome?

#3 Updated by Jiayong Li over 1 year ago

ShellCommandRequirement enables cwl to parse shell characters like && or |, see http://www.commonwl.org/v1.0/CommandLineTool.html#ShellCommandRequirement.

#5 Updated by Keldin Sergheyev over 1 year ago

Review of su92l-xvhdp-zb7seb8e430gfmy beagle.cwl:

  • Do commands look right/use appropriate flags and resources?
    • Yes ✓ It's just one beagle command && tabix command to output indices. The beagle jar file is in the docker container.
    • 1 thread, default window size ✓
    • 1k genomes reference in bref format ✓
    • Genetic map is GRCh37, which is fine for all but chrM for HG19 genome ✓
    • Target is vcf of chromosome 19 of GS12877, which has HG19 for reference. ✓
    • Dockerfile in l7g-ml repo looks fine. ✓
  • Do logs show any errors?
    • None ✓
  • Does output look correct? (Cannot "grade" accuracy)
    • Desired outputs are captured. Imputed VCF looks correct. stdout for beagle is just some information about time on each phasing window, so sending to log is fine. (i.e. info about timing could be retrieved from log)
  • Is output meaningful?
    • Input has already been phased by eagle, so presumably beagle performs no phasing (unless beagle is phasing a position that was passed through by eagle, or overwriting the phasing of eagle; but this is unlikely because eagle & beagle are using the same reference panel). ✓
    • Output is indeed phased and imputed. ✓
  • Open-ended: How were errors I encountered overcome?
    • Main problem encountered was ERROR:duplicate allele at 19:_____, but that was for an unphased input. Perhaps eagle discards them?
  • Remaining issues or questions:
    • 1k genomes data is technically whole genome, but in VCF, beagle (and perhaps eagle as well), ignores END= tags (runs of reference).

#6 Updated by Keldin Sergheyev over 1 year ago

  • Due date set to 04/10/2018
  • % Done changed from 0 to 50
  • Estimated time set to 8.00 h
  • Remaining (hours) set to 8.0

#9 Updated by Keldin Sergheyev over 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100
  • Remaining (hours) changed from 8.0 to 0.0

Checked scattering workflows. All looks good.

Also available in: Atom PDF