Single nucleotide polymorphisms (SNPs) comprise the majority of the genetic differences between human individuals. Non-synonymous coding SNPs (nsSNPs), which result in amino acid replacements in protein sequences, together with c/s-regulatory SNPs affecting transcription and splicing are thought collectively to account for much of the genetic component of individual variation in susceptibility to complex diseases, response to Pharmaceuticals, and other phenotypes. Identification of functional nsSNPs can be facilitated by computational predictions based on the analysis of protein multiple sequence alignments, 3D structures and sequence annotations. This analysis was earlier automated in the computer program PolyPhen, an online tool maintained in our laboratory. Numerous researchers in diverse fields currently use PolyPhen to predict the effect of nsSNPs on protein structure and function. However, there is an increasing need for more accurate computational approaches to improve such predictions and to expand applicability of PolyPhen to all classes of polymorphisms. This proposal focuses on improving methods to predict the functional effect of SNPs in the human genome incorporated in PolyPhen and on transforming PolyPhen into scalable user-friendly cross-platform software. The proposal targets three Specific Aims: First, we propose to improve accuracy of PolyPhen by introducing new computational strategies for prediction of the effect of nsSNPs on protein structure and function (Specific Aim 1). Methodological innovations will include development of a multiple sequence alignment pipeline suppressing false predictions arising from misalignments. A new method will eliminate false-negative predictions resulting from compensatory substitutions in homologous sequences. We will use a structurally optimized Bayesian classifier to predict the functional effect of nsSNPs based on multiple features derived from protein sequence and structure. Next, we propose to extend the prediction method to non-coding SNPs (Specific Aim 2). We plan to take advantage of the extensive comparative genomic data that have been and continue to be generated. We will introduce a computational approach to predict functional SNPs in non-coding regions on the basis of probabilistic evolutionary models Finally, we plan to incorporate these developments into a new version of the PolyPhen software system, which will address significant demand for a robust, cross-platform tool that can be easily applied by diverse investigators to the problem of functional analysis of human SNPs (Specific Aim 3). This new version of PolyPhen will be incorporated into the Clinical Research Chart developed by I2b2 National Center of Biomedical Computing and integrated with VISTA visualization tools.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-D (51))
Program Officer
Remington, Karin A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810
Sohail, Mashaal; Vakhrusheva, Olga A; Sul, Jae Hoon et al. (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356:539-542
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605
Sul, Jae Hoon; Cade, Brian E; Cho, Michael H et al. (2016) Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. Am J Hum Genet 99:846-859
Savova, Virginia; Chun, Sung; Sohail, Mashaal et al. (2016) Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 48:231-237
Lenz, Tobias L; Spirin, Victor; Jordan, Daniel M et al. (2016) Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol Biol Evol 33:2555-64
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle et al. (2015) Identification of cis-suppression of human disease mutations by comparative genomics. Nature 524:225-9
Balick, Daniel J; Do, Ron; Cassa, Christopher A et al. (2015) Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 11:e1005436
Kazanov, Marat D; Roberts, Steven A; Polak, Paz et al. (2015) APOBEC-Induced Cancer Mutations Are Uniquely Enriched in Early-Replicating, Gene-Dense, and Active Chromatin Regions. Cell Rep 13:1103-1109
Francioli, Laurent C; Polak, Paz P; Koren, Amnon et al. (2015) Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47:822-826

Showing the most recent 10 out of 29 publications