Find consensus nonref alleles & call when matching ref
There are some variants in the reference genome which are actually the rare variants and nonconsensus. For example, factor V Leiden. Hopefully someday reference will be fixed, but for now we can solve the issue this way:
(1) Create (a program which creates) a list of variants which are majority nonreference in CGI's Diversity panel.
(2) Convert the current "add reference allele" python step to instead check all variants in this list -- if the position is called as matching reference, create a separate line calling it as a variant. For all variants, attach the consensus allele rather than the reference allele.
(3) Modify the amino acid prediction program to use the consensus allele when putting together the reference transcript to compare against. (I think currently it ignores the reference allele and just pulls the region from reference, but I could be misremembering.)