Bug #531


Missed SNPs via CGI 1.3 to GFF conversion script

Added by Evan Maxwell over 13 years ago. Updated about 13 years ago.

Assigned To:
Target version:
Story points:


In the python script which processes CGI files into our GFF format (source:server/conversion/, certain SNPs are missed (i.e. not reported in GFF) due to the conditional statement at line 63. For example, see locus 1109 in file var-GS06985-1100-36-ASM.tsv:

locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef

1109 2 1 chr1 47860 47861 ref A A 135 453
1109 2 1 chr1 47861 47862 snp G T 135 453 dbsnp.100:rs2531231
1109 2 2 chr1 47860 47862 sub AG GT 135 454 dbsnp.100:rs2531230;dbsnp.100:rs2531231

In this case, we should report a SNP like "alleles AT/GT;ref_allele AG". This is missed because we expect to find a line for allele 1 which matches a line for allele 2 in terms of begin/end position. In this case, however, they only match when considered in combination.

Actions #1

Updated by Madeleine Ball over 13 years ago

  • Status changed from New to In Progress
  • Assigned To set to Madeleine Ball
Actions #2

Updated by Madeleine Ball over 13 years ago

I've pushed my fix. It still skips any het allele regions (alleles reported separately) if any part of the region is not called (even if it's just part of one allele).

Actions #3

Updated by Madeleine Ball about 13 years ago

  • Status changed from In Progress to Resolved
  • Resolution set to fixed

My fix has been pulled into the master, so I'm marking this as resolved.


Also available in: Atom PDF