The focus of genomics research is rapidly shifting from the accumulation of genetic variation data to the functional interpretation of allelic variant. Sequencing studies are becoming the standard approach in all areas of genetics, generating an unprecedented demand for computational methods to predict the functional effect of mutations. We continuously develop and maintain PolyPhen-2, a computational method for predicting the functional effect of missense mutations. PolyPhen-2 makes predictions based on comparative sequence analysis and analysis of protein structure. This method is being widely applied in diverse areas of genetics. In spite of the large user base and our continuing efforts to increase prediction accuracy, there is an ample room for improvement and a great need to improve accuracy of the method. Our recent studies on population genetics of deleterious alleles point to fundamental complexities in the analysis and prediction of deleterious variation. The improved understanding of these complexities, new types of training and validation data and algorithmic approaches position us to substantially improve the computational method and the software. We will also expand the utility of the method by addressing previously underserved needs. Gene discovery studies prioritize identified variants both at the gene and the variant level. The question currently addressed by PolyPhen-2 and other prediction methods is whether a given variant is likely to affect gene function. Equally important considerations are whether a gene that harbors this variant is a morbid gene and whether most missense changes in this gene or a domain are likely to have a functional impact. Deep population sequencing data together with catalogs of known disease variants can be used in concert with evolutionary and structural analyses to prioritize genes. Many large-scale sequencing projects are transitioning from exomes to whole genome sequencing. This opens a perspective for the analysis of non-coding variation. Non-coding variation has been shown to play a key role in genetics of polygenic complex phenotypes. However, the importance of large effect non-coding variants for phenotypes that segregate in the Mendelian fashion is unclear and still under debate. Through separately supported whole genome sequencing of cases of Mendelian diseases linked to known loci but lacking protein-coding variants we will select non-coding Mendelian mutations in an unbiased fashion, analyze the potential underlying biology and will develop a computational predictor. This approach is fundamentally different from existing efforts on the analysis of non-coding variation that predict conservation or a reduction in sequence diversity rather than directly ascertain the pathogenic effect.
In Specific Aim 1 we will make substantial improvements in computational methods for predicting the functional effect of mutations and incorporate these improvements into the PolyPhen software.
In Specific Aim 2 we will develop gene-based scores based on population and disease genetics data and integrate them with the variant-based predictions.
In Specific Aim 3 we will extend the prediction to non-coding variation.
The focus of genomics research is rapidly shifting from the accumulation of genetic variation data to the functional interpretation of allelic variants. We wil make substantial improvements in computational methods to predict the functional effect of mutations and incorporate these improvements into the PolyPhen software. We will also develop gene- based scores and will extend the prediction methods to non-coding variation.
Showing the most recent 10 out of 31 publications