Rapid advances in DNA sequencing technology enabled massive identification and cataloging of human allelic variation in research and clinical setting. A key challenge for human genetics today is to identify, among the myriad of alleles, those variants that have an effect on molecular function and phenotypes. We earlier developed computational methods for predicting the functional effect of human mutations and non-synonymous SNPs and implemented these methods in software tools PolyPhen and subsequently PolyPhen-2. We maintain both online and standalone versions of these computational tools in our laboratory. These tools are widely used by geneticists in a variety of research and clinical applications. Explosion of large-scale population sequencing projects greatly increased demand for the prediction methods. These projects also set new requirements for significant improvements of the methods and for tailoring software to specific applications in new technologically advanced human genetics. Specifically, massive exome sequencing projects aiming at identifying genes that harbor rare coding variants involved in human phenotypes require highly accurate, easy to use and fast methods for annotating large numbers of sequence variants. On the other hand, DNA sequencing is rapidly becoming a method of choice in clinical genetic diagnostics. Interpretation of novel sequence variants in human disease genes becomes the major bottleneck in diagnostic analysis of sequencing data. Applications to clinical genetic diagnostics require substantial increase in the accuracy of prediction methods and development of methods that target specific protein groups and generate predictions specific to individual diagnostic tests. The current need in interpretation of sequence variants is paralleled by the opportunity to greatly enhance computational methods and software. Genomes of multiple vertebrates provide a rich resource of information for generating predictions. New statistical approaches are needed to optimally employ these data. Recent increase of the size of databases of human mutations and common SNPs provide much larger training and testing datasets. New methods should be developed to fully benefit from large training and testing data.
In Specific Aim 1, we will develop a prediction method guided by the phylogenetic tree that would utilize alignments of vertebrate genomes. We will further incorporate interactions between amino acid positions in the analysis of comparative genomics data to take into account compensatory substitutions.
In Specific Aim 2, we will develop a version of PolyPhen software for the analysis of exome or genome sequencing datasets. We will integrate functional predictions in the statistical tests to detect phenotypic association of rare non- synonymous variants.
In Specific Aim 3, in close collaboration with clinical geneticists we will test feasibility of developing prediction methods specialized for individual diagnostic tests that would achieve clinically useful levels of specificity and sensitivity!

Public Health Relevance

A key challenge for human genetics today is to identify, among the myriad of alleles discovered by massive DNA sequencing projects, genetic variants that have an effect on molecular function and human disease. We earlier developed widely used software for predicting the functional effect of human alleles. We plan to substantially increase the accuracy of the computational prediction method, adapt the method to the needs of large-scale sequencing projects and specific genetic diagnostic tests.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM078598-07
Application #
8663922
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Eckstrand, Irene A
Project Start
2006-04-01
Project End
2016-04-30
Budget Start
2014-05-01
Budget End
2015-04-30
Support Year
7
Fiscal Year
2014
Total Cost
$365,925
Indirect Cost
$160,925
Name
Brigham and Women's Hospital
Department
Type
DUNS #
030811269
City
Boston
State
MA
Country
United States
Zip Code
02115
Savova, Virginia; Chun, Sung; Sohail, Mashaal et al. (2016) Genes with monoallelic expression contribute disproportionately to genetic diversity in humans. Nat Genet 48:231-7
Lenz, Tobias L; Spirin, Victor; Jordan, Daniel M et al. (2016) Excess of Deleterious Mutations around HLA Genes Reveals Evolutionary Cost of Balancing Selection. Mol Biol Evol 33:2555-64
Sul, Jae Hoon; Cade, Brian E; Cho, Michael H et al. (2016) Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. Am J Hum Genet 99:846-859
Kazanov, Marat D; Roberts, Steven A; Polak, Paz et al. (2015) APOBEC-Induced Cancer Mutations Are Uniquely Enriched in Early-Replicating, Gene-Dense, and Active Chromatin Regions. Cell Rep 13:1103-9
Do, Ron; Balick, Daniel; Li, Heng et al. (2015) No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat Genet 47:126-31
Balick, Daniel J; Do, Ron; Cassa, Christopher A et al. (2015) Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 11:e1005436
Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle et al. (2015) Identification of cis-suppression of human disease mutations by comparative genomics. Nature 524:225-9
Francioli, Laurent C; Polak, Paz P; Koren, Amnon et al. (2015) Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47:822-6
Agarwala, Vineeta; Flannick, Jason; Sunyaev, Shamil et al. (2013) Evaluating empirical bounds on complex disease genetic architecture. Nat Genet 45:1418-27
Thompson, Bryony A; Greenblatt, Marc S; Vallee, Maxime P et al. (2013) Calibration of multiple in silico tools for predicting pathogenicity of mismatch repair gene missense substitutions. Hum Mutat 34:255-65

Showing the most recent 10 out of 26 publications