The focus of genomics research is rapidly shifting from the accumulation of genetic variation data to the functional interpretation of allelic variant. Sequencing studies are becoming the standard approach in all areas of genetics, generating an unprecedented demand for computational methods to predict the functional effect of mutations. We continuously develop and maintain PolyPhen-2, a computational method for predicting the functional effect of missense mutations. PolyPhen-2 makes predictions based on comparative sequence analysis and analysis of protein structure. This method is being widely applied in diverse areas of genetics. In spite of the large user base and our continuing efforts to increase prediction accuracy, there is an ample room for improvement and a great need to improve accuracy of the method. Our recent studies on population genetics of deleterious alleles point to fundamental complexities in the analysis and prediction of deleterious variation. The improved understanding of these complexities, new types of training and validation data and algorithmic approaches position us to substantially improve the computational method and the software. We will also expand the utility of the method by addressing previously underserved needs. Gene discovery studies prioritize identified variants both at the gene and the variant level. The question currently addressed by PolyPhen-2 and other prediction methods is whether a given variant is likely to affect gene function. Equally important considerations are whether a gene that harbors this variant is a morbid gene and whether most missense changes in this gene or a domain are likely to have a functional impact. Deep population sequencing data together with catalogs of known disease variants can be used in concert with evolutionary and structural analyses to prioritize genes. Many large-scale sequencing projects are transitioning from exomes to whole genome sequencing. This opens a perspective for the analysis of non-coding variation. Non-coding variation has been shown to play a key role in genetics of polygenic complex phenotypes. However, the importance of large effect non-coding variants for phenotypes that segregate in the Mendelian fashion is unclear and still under debate. Through separately supported whole genome sequencing of cases of Mendelian diseases linked to known loci but lacking protein-coding variants we will select non-coding Mendelian mutations in an unbiased fashion, analyze the potential underlying biology and will develop a computational predictor. This approach is fundamentally different from existing efforts on the analysis of non-coding variation that predict conservation or a reduction in sequence diversity rather than directly ascertain the pathogenic effect.
In Specific Aim 1 we will make substantial improvements in computational methods for predicting the functional effect of mutations and incorporate these improvements into the PolyPhen software.
In Specific Aim 2 we will develop gene-based scores based on population and disease genetics data and integrate them with the variant-based predictions.
In Specific Aim 3 we will extend the prediction to non-coding variation.

Public Health Relevance

The focus of genomics research is rapidly shifting from the accumulation of genetic variation data to the functional interpretation of allelic variants. We wil make substantial improvements in computational methods to predict the functional effect of mutations and incorporate these improvements into the PolyPhen software. We will also develop gene- based scores and will extend the prediction methods to non-coding variation.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM078598-10
Application #
9281738
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2007-05-01
Project End
2018-04-30
Budget Start
2017-05-01
Budget End
2018-04-30
Support Year
10
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Brigham and Women's Hospital
Department
Type
DUNS #
030811269
City
Boston
State
MA
Country
United States
Zip Code
02115
Cassa, Christopher A; Jordan, Daniel M; Adzhubei, Ivan et al. (2018) A literature review at genome scale: improving clinical variant assessment. Genet Med 20:936-941
Haghighi, Alireza; Krier, Joel B; Toth-Petroczy, Agnes et al. (2018) An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery. NPJ Genom Med 3:21
Sohail, Mashaal; Vakhrusheva, Olga A; Sul, Jae Hoon et al. (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356:539-542
Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605
Sul, Jae Hoon; Cade, Brian E; Cho, Michael H et al. (2016) Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. Am J Hum Genet 99:846-859
Savova, Virginia; Chun, Sung; Sohail, Mashaal et al. (2016) Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 48:231-237
Lenz, Tobias L; Spirin, Victor; Jordan, Daniel M et al. (2016) Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol Biol Evol 33:2555-64
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle et al. (2015) Identification of cis-suppression of human disease mutations by comparative genomics. Nature 524:225-9
Balick, Daniel J; Do, Ron; Cassa, Christopher A et al. (2015) Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 11:e1005436

Showing the most recent 10 out of 31 publications