Bug #531
closedMissed SNPs via CGI 1.3 to GFF conversion script
Description
In the python script which processes CGI files into our GFF format (source:server/conversion/cgi1.3_to_gff.py), certain SNPs are missed (i.e. not reported in GFF) due to the conditional statement at line 63. For example, see locus 1109 in file var-GS06985-1100-36-ASM.tsv:
locus ploidy allele chromosome begin end varType reference alleleSeq totalScore hapLink xRef
1109 2 1 chr1 47860 47861 ref A A 135 453
1109 2 1 chr1 47861 47862 snp G T 135 453 dbsnp.100:rs2531231
1109 2 2 chr1 47860 47862 sub AG GT 135 454 dbsnp.100:rs2531230;dbsnp.100:rs2531231
In this case, we should report a SNP like "alleles AT/GT;ref_allele AG". This is missed because we expect to find a line for allele 1 which matches a line for allele 2 in terms of begin/end position. In this case, however, they only match when considered in combination.
Updated by Madeleine Ball almost 14 years ago
- Status changed from New to In Progress
- Assigned To set to Madeleine Ball
Updated by Madeleine Ball almost 14 years ago
I've pushed my fix. It still skips any het allele regions (alleles reported separately) if any part of the region is not called (even if it's just part of one allele).
Updated by Madeleine Ball over 13 years ago
- Status changed from In Progress to Resolved
- Resolution set to fixed
My fix has been pulled into the master, so I'm marking this as resolved.