Bug #506

Standardize gene names (correctly)

Added by Madeleine Ball over 9 years ago. Updated over 8 years ago.

Status:
New
Priority:
Normal
Assigned To:
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Billable:
Estimatedhours:
Hours:
Totalhours:
Resolution:
Story points:
-

Description

Some attempt at using standard gene names has occurred, but there's mistakes in the implementation -- there are cases when a name is both a standard name, and an alias that could point to a different standard name. The most conservative thing to do in these cases would be to accept the given name and not change it. One might also use the positions associated with gene names to distinguish between the two cases.

Right now it's been implemented incorrectly somewhere, resulting in gene names inconsistent with the position (in other words, moved to a new standard name when the old name was correct and standard).

The current GET-Evidence GJA9 L422F is a variant in ABT at chr1 39113094. According to genenames.org, this is both a standard name (to a gene on chr1) and an alias to another standard name, "GJD2" (on chr 15). So based on the position we can infer that "GJA9" is the correct, standard name for this.

But GET-Evidence incorrectly has this variant entered under GJD2: http://evidence.personalgenomes.org/GJD2-L422F


Related issues

Related to GET-Evidence - Feature #485: Don't let users add variant pages for genes that aren't in knownGeneNew

History

#1 Updated by Tom Clegg over 9 years ago

The "incorrect fix" has been backed out, although there are still (probably) cases of multiple variants that refer to the same gene under different names.

#2 Updated by Madeleine Ball about 9 years ago

BLM and RECQL3 both refer to the same gene (UCSC currently uses BLM rather than RECQL3). In GET we have 2 OMIM imported variants in RECQL3 but all genomes are processed as BLM.

#3 Updated by Ward Vandewege over 8 years ago

  • Project changed from External to GET-Evidence
  • Category deleted (GET-Evidence)

#4 Updated by Madeleine Ball over 8 years ago

Changes in the transcript file we use means that gene names now produced are almost forced to be a name that is in HGNC gene names and consistent with chromosome info (if available):

HGNC list:
http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=onlevel=pri&=on&order_by=gd_app_sym_sort&limit=&format=text&.cgifields=&.cgifields=level&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag&&where=&status=Approved&status_opt=1&submit=submit&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=gd_pub_refseq_ids

Names are added by source:server/script/getCanonicalWithName.pl
(It's still possible to have a non-HGNC name, but only if you couldn't find an HGNC name after trying all the steps outlined in the above script.)

I think we should consider removing all GET-Evidence entries that are not in one of the imported databases (OMIM/PharmGKB/etc) and were only found in one of the old genome processing runs -- this will clean out messed up placements and gene names that will never be looked at again. There may still be some nonstandard gene names from OMIM but I think that's less of a concern.

#5 Updated by Madeleine Ball over 8 years ago

  • Priority changed from High to Normal

Also available in: Atom PDF