Bug #531

Missed SNPs via CGI 1.3 to GFF conversion script

Added by Evan Maxwell over 11 years ago. Updated about 11 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


In the python script which processes CGI files into our GFF format (source:server/conversion/cgi1.3_to_gff.py), certain SNPs are missed (i.e. not reported in GFF) due to the conditional statement at line 63. For example, see locus 1109 in file var-GS06985-1100-36-ASM.tsv:

locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef

1109 2 1 chr1 47860 47861 ref A A 135 453
1109 2 1 chr1 47861 47862 snp G T 135 453 dbsnp.100:rs2531231
1109 2 2 chr1 47860 47862 sub AG GT 135 454 dbsnp.100:rs2531230;dbsnp.100:rs2531231

In this case, we should report a SNP like "alleles AT/GT;ref_allele AG". This is missed because we expect to find a line for allele 1 which matches a line for allele 2 in terms of begin/end position. In this case, however, they only match when considered in combination.

Associated revisions

Revision 6bc0e858 (diff)
Added by Madeleine Ball about 11 years ago

Fix CGI interpretation bug

This fixes #531 which found some heterozygous calls were being missed.
This script should get everything, but will skip as "not-called" any
region with heterozygous / individual allele calls in which any part
of the region is no-call.


#1 Updated by Madeleine Ball about 11 years ago

  • Status changed from New to In Progress
  • Assigned To set to Madeleine Ball

#2 Updated by Madeleine Ball about 11 years ago

I've pushed my fix. It still skips any het allele regions (alleles reported separately) if any part of the region is not called (even if it's just part of one allele).

#3 Updated by Madeleine Ball about 11 years ago

  • Status changed from In Progress to Resolved
  • Resolution set to fixed

My fix has been pulled into the master, so I'm marking this as resolved.

Also available in: Atom PDF