Project

General

Profile

Actions

Bug #531

closed

Missed SNPs via CGI 1.3 to GFF conversion script

Added by Evan Maxwell about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Target version:
-
Story points:
-

Description

In the python script which processes CGI files into our GFF format (source:server/conversion/cgi1.3_to_gff.py), certain SNPs are missed (i.e. not reported in GFF) due to the conditional statement at line 63. For example, see locus 1109 in file var-GS06985-1100-36-ASM.tsv:

locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef

1109 2 1 chr1 47860 47861 ref A A 135 453
1109 2 1 chr1 47861 47862 snp G T 135 453 dbsnp.100:rs2531231
1109 2 2 chr1 47860 47862 sub AG GT 135 454 dbsnp.100:rs2531230;dbsnp.100:rs2531231

In this case, we should report a SNP like "alleles AT/GT;ref_allele AG". This is missed because we expect to find a line for allele 1 which matches a line for allele 2 in terms of begin/end position. In this case, however, they only match when considered in combination.

Actions

Also available in: Atom PDF