Project

General

Profile

Actions

Feature #522

closed

Add Polyphen 2 predictions to GET-Evidence

Added by Madeleine Ball about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
High
Assigned To:
-
Target version:
-
Story points:
-

Description

We currently use BLOSUM100 score to identify variants as being "disruptive", but there are algorithms out there that are specialized for making a computational prediction of pathogenic (or otherwise phenotypic) effect - e.g. Polyphen and SIFT. Some connections with the Sunyaev lab makes Polyphen a good candidate for using in GET-Evidence, and the Polyphen 2 data is all entirely downloadable:
http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads
In particular, "PolyPhen-2 annotations for whole human proteome sequence space (WHPSS) build 3" contains all possible amino acid changes caused by single base substitutions.
WARNING: THE FILES ARE TARBOMBS. Move them into a new directory before extracting.

Even though there are licensing issues with the code itself, we think integrating the downloadable dataset should be okay.

How should it be incorporated? Not sure.

The file is huge (1.6GiB), it seems like a bad idea to require incorporation of it in all instances of GET-Evidence -- maybe only on the production server. We could create a script that regularly checks GET-Evidence for variants with amino acid changes that are missing Polyphen 2 data & update them; this script would not run on most instances of GET-Evidence.

Unfortunately, if we want to prioritize an insufficiently evaluated variants by autoscore, and if the variant is not yet in GET-Evidence, we won't be able to use the Polyphen score in the autoscoring. Maybe we could have some backup behavior using BLOSUM score. Maybe installations could default to using the dbSNP version, which is only 16MB? "PolyPhen-2 annotations for dbSNP build 131"

Note: If I recall correctly, the IDs for genes in their data are uniprot IDs, and I think they are also in knownGene.txt.

Actions #1

Updated by Madeleine Ball about 13 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Tom Clegg about 13 years ago

8e07102 adds import_polyphen2.php which reads a TSV file (variant_name, variant_id, polyphen2_score) and installs it as an "other external references" entry.

Caveat: import_polyphen2 doesn't do any sanity-checking of variant_name/variant_id -- it just assumes that the variant_id is correct for this installation. In other words, whatever upstream script looks up the variant_id's must be run on the same database as import_polyphen2 itself.

Actions #3

Updated by Madeleine Ball almost 13 years ago

  • Status changed from New to Resolved

a4b2b95c added a program which pulls PPH2 scores from the PPH2 dump for variants matching GET-Evidence variants. Tom then imported these into the GET-Evidence database.

Actions

Also available in: Atom PDF