Missed SNPs via CGI 1.3 to GFF conversion script
In the python script which processes CGI files into our GFF format (source:server/conversion/cgi1.3_to_gff.py), certain SNPs are missed (i.e. not reported in GFF) due to the conditional statement at line 63. For example, see locus 1109 in file var-GS06985-1100-36-ASM.tsv:
locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef
1109 2 1 chr1 47860 47861 ref A A 135 453
1109 2 1 chr1 47861 47862 snp G T 135 453 dbsnp.100:rs2531231
1109 2 2 chr1 47860 47862 sub AG GT 135 454 dbsnp.100:rs2531230;dbsnp.100:rs2531231
In this case, we should report a SNP like "alleles AT/GT;ref_allele AG". This is missed because we expect to find a line for allele 1 which matches a line for allele 2 in terms of begin/end position. In this case, however, they only match when considered in combination.