Single nucleotide polymorphisms (SNPs) comprise the majority of the genetic differences between human individuals. Non-synonymous coding SNPs (nsSNPs), which result in amino acid replacements in protein sequences, together with c/s-regulatory SNPs affecting transcription and splicing are thought collectively to account for much of the genetic component of individual variation in susceptibility to complex diseases, response to Pharmaceuticals, and other phenotypes. Identification of functional nsSNPs can be facilitated by computational predictions based on the analysis of protein multiple sequence alignments, 3D structures and sequence annotations. This analysis was earlier automated in the computer program PolyPhen, an online tool maintained in our laboratory. Numerous researchers in diverse fields currently use PolyPhen to predict the effect of nsSNPs on protein structure and function. However, there is an increasing need for more accurate computational approaches to improve such predictions and to expand applicability of PolyPhen to all classes of polymorphisms. This proposal focuses on improving methods to predict the functional effect of SNPs in the human genome incorporated in PolyPhen and on transforming PolyPhen into scalable user-friendly cross-platform software. The proposal targets three Specific Aims: First, we propose to improve accuracy of PolyPhen by introducing new computational strategies for prediction of the effect of nsSNPs on protein structure and function (Specific Aim 1). Methodological innovations will include development of a multiple sequence alignment pipeline suppressing false predictions arising from misalignments. A new method will eliminate false-negative predictions resulting from compensatory substitutions in homologous sequences. We will use a structurally optimized Bayesian classifier to predict the functional effect of nsSNPs based on multiple features derived from protein sequence and structure. Next, we propose to extend the prediction method to non-coding SNPs (Specific Aim 2). We plan to take advantage of the extensive comparative genomic data that have been and continue to be generated. We will introduce a computational approach to predict functional SNPs in non-coding regions on the basis of probabilistic evolutionary models Finally, we plan to incorporate these developments into a new version of the PolyPhen software system, which will address significant demand for a robust, cross-platform tool that can be easily applied by diverse investigators to the problem of functional analysis of human SNPs (Specific Aim 3). This new version of PolyPhen will be incorporated into the Clinical Research Chart developed by I2b2 National Center of Biomedical Computing and integrated with VISTA visualization tools.
Showing the most recent 10 out of 31 publications