Project

General

Profile

Actions

Bug #531

closed

Missed SNPs via CGI 1.3 to GFF conversion script

Added by Evan Maxwell about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Target version:
-
Story points:
-

Description

In the python script which processes CGI files into our GFF format (source:server/conversion/cgi1.3_to_gff.py), certain SNPs are missed (i.e. not reported in GFF) due to the conditional statement at line 63. For example, see locus 1109 in file var-GS06985-1100-36-ASM.tsv:

locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef

1109 2 1 chr1 47860 47861 ref A A 135 453
1109 2 1 chr1 47861 47862 snp G T 135 453 dbsnp.100:rs2531231
1109 2 2 chr1 47860 47862 sub AG GT 135 454 dbsnp.100:rs2531230;dbsnp.100:rs2531231

In this case, we should report a SNP like "alleles AT/GT;ref_allele AG". This is missed because we expect to find a line for allele 1 which matches a line for allele 2 in terms of begin/end position. In this case, however, they only match when considered in combination.

Actions #1

Updated by Madeleine Ball about 13 years ago

  • Status changed from New to In Progress
  • Assigned To set to Madeleine Ball
Actions #2

Updated by Madeleine Ball about 13 years ago

I've pushed my fix. It still skips any het allele regions (alleles reported separately) if any part of the region is not called (even if it's just part of one allele).

Actions #3

Updated by Madeleine Ball almost 13 years ago

  • Status changed from In Progress to Resolved
  • Resolution set to fixed

My fix has been pulled into the master, so I'm marking this as resolved.

Actions

Also available in: Atom PDF