Rapid advances in DNA sequencing technology enabled massive identification and cataloging of human allelic variation in research and clinical setting. A key challenge for human genetics today is to identify, among the myriad of alleles, those variants that have an effect on molecular function and phenotypes. We earlier developed computational methods for predicting the functional effect of human mutations and non-synonymous SNPs and implemented these methods in software tools PolyPhen and subsequently PolyPhen-2. We maintain both online and standalone versions of these computational tools in our laboratory. These tools are widely used by geneticists in a variety of research and clinical applications. Explosion of large-scale population sequencing projects greatly increased demand for the prediction methods. These projects also set new requirements for significant improvements of the methods and for tailoring software to specific applications in new technologically advanced human genetics. Specifically, massive exome sequencing projects aiming at identifying genes that harbor rare coding variants involved in human phenotypes require highly accurate, easy to use and fast methods for annotating large numbers of sequence variants. On the other hand, DNA sequencing is rapidly becoming a method of choice in clinical genetic diagnostics. Interpretation of novel sequence variants in human disease genes becomes the major bottleneck in diagnostic analysis of sequencing data. Applications to clinical genetic diagnostics require substantial increase in the accuracy of prediction methods and development of methods that target specific protein groups and generate predictions specific to individual diagnostic tests. The current need in interpretation of sequence variants is paralleled by the opportunity to greatly enhance computational methods and software. Genomes of multiple vertebrates provide a rich resource of information for generating predictions. New statistical approaches are needed to optimally employ these data. Recent increase of the size of databases of human mutations and common SNPs provide much larger training and testing datasets. New methods should be developed to fully benefit from large training and testing data.
In Specific Aim 1, we will develop a prediction method guided by the phylogenetic tree that would utilize alignments of vertebrate genomes. We will further incorporate interactions between amino acid positions in the analysis of comparative genomics data to take into account compensatory substitutions.
In Specific Aim 2, we will develop a version of PolyPhen software for the analysis of exome or genome sequencing datasets. We will integrate functional predictions in the statistical tests to detect phenotypic association of rare non- synonymous variants.
In Specific Aim 3, in close collaboration with clinical geneticists we will test feasibility of developing prediction methods specialized for individual diagnostic tests that would achieve clinically useful levels of specificity and sensitivity!

Public Health Relevance

A key challenge for human genetics today is to identify, among the myriad of alleles discovered by massive DNA sequencing projects, genetic variants that have an effect on molecular function and human disease. We earlier developed widely used software for predicting the functional effect of human alleles. We plan to substantially increase the accuracy of the computational prediction method, adapt the method to the needs of large-scale sequencing projects and specific genetic diagnostic tests.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Eckstrand, Irene A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Cassa, Christopher A; Jordan, Daniel M; Adzhubei, Ivan et al. (2018) A literature review at genome scale: improving clinical variant assessment. Genet Med 20:936-941
Haghighi, Alireza; Krier, Joel B; Toth-Petroczy, Agnes et al. (2018) An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery. NPJ Genom Med 3:21
Sohail, Mashaal; Vakhrusheva, Olga A; Sul, Jae Hoon et al. (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356:539-542
Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605
Sul, Jae Hoon; Cade, Brian E; Cho, Michael H et al. (2016) Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. Am J Hum Genet 99:846-859
Savova, Virginia; Chun, Sung; Sohail, Mashaal et al. (2016) Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 48:231-237
Lenz, Tobias L; Spirin, Victor; Jordan, Daniel M et al. (2016) Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol Biol Evol 33:2555-64
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle et al. (2015) Identification of cis-suppression of human disease mutations by comparative genomics. Nature 524:225-9
Balick, Daniel J; Do, Ron; Cassa, Christopher A et al. (2015) Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 11:e1005436

Showing the most recent 10 out of 31 publications