Understanding the relationship between genotype and phenotype is the central goal of genetics. Available heritability estimates for many human traits of medical relevance suggest that 30-80% of phenotypic variation is due to underlying genetic variation. The ability to predict phenotypes based on genotypes is the ultimate test of our understanding of complex trait genetics. Since the dawn of complex trait genetics in the early 20th century, progress has been limited by the availability of genetic data in well-phenotyped populations. Now, due to the extraordinary progress in technology, microarray genotyping datasets, exome sequencing datasets and targeted sequencing datasets are available for large clinically phenotyped populations, and functional data is becoming available. A future explosion of whole-genome sequencing data is also widely anticipated. This shifts the focus from data acquisition to data interpretation and development of computational and statistical methods for predicting phenotypes from genotypes and functional information. We propose to develop new methods for predicting phenotypes from genotypes and apply these methods to newly collected data on human complex traits of direct medical interest, including both quantitative and disease traits. Our work on phenotype prediction will be informative about the allelic architecture of complex traits and will provide guidance for future genetic studies. From a practical perspective, there is an ongoing debate on the potential of genetic diagnostics in identification of individuals at elevated risk for specific complex diseases early in life. If successful, genetic diagnostics may inform selection of patients for early therapeutic intervention. However, the practical utility of genetics in evaluating risk of complex diseases has not been proven and is widely debated. We will rigorously test the hypothesis of the utility of genotype-based phenotypic predictions.
In Specific Aim 1 we will develop and test new statistical methods for predicting phenotypes from microarray genotyping data. We will investigate several model selection and shrinkage strategies. We will evaluate whether it is more efficient to estimate contributions of individual markers independently or to fit all markers simultaneously.
In Specific Aim 2 we will improve polygenic prediction in populations of diverse ancestry. It is important that medical progress not be limited to European populations. Our methods will generate predictions across human populations, accounting for population differences in allele frequencies, rates of allelic variation and patterns of linkage disequilibrium.
In Specific Aim 3 we will develop and test statistical methods for predicting phenotypes from sequencing data. Sequencing data provide a distinct set of statistical challenges because they contain low-frequency and rare allelic variants, and often the effects of individual rare variants cannot be estimated.
In Specific Aim 4 we will incorporate functional data into methods for phenotype prediction. We will investigate whether incorporation of functional data can improve phenotype predictions from genetic data.

Public Health Relevance

Deciphering DNA of individual human patients opens a perspective to predict genetic risk of common human diseases. We will develop statistical methods for predicting disease risk based on collective action of many genes. We will test feasibility of disease risk prediction given the rapidly growing amount of genetic and functional data.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code