Understanding the relationship between genotype and phenotype is the central goal of genetics. Available heritability estimates for many human traits of medical relevance suggest that 30-80% of phenotypic variation is due to underlying genetic variation. The ability to predict phenotypes based on genotypes is the ultimate test of our understanding of complex trait genetics. Since the dawn of complex trait genetics in the early 20th century, progress has been limited by the availability of genetic data in well-phenotyped populations. Now, due to the extraordinary progress in technology, microarray genotyping datasets, exome sequencing datasets and targeted sequencing datasets are available for large clinically phenotyped populations, and functional data is becoming available. A future explosion of whole-genome sequencing data is also widely anticipated. This shifts the focus from data acquisition to data interpretation and development of computational and statistical methods for predicting phenotypes from genotypes and functional information. We propose to develop new methods for predicting phenotypes from genotypes and apply these methods to newly collected data on human complex traits of direct medical interest, including both quantitative and disease traits. Our work on phenotype prediction will be informative about the allelic architecture of complex traits and will provide guidance for future genetic studies. From a practical perspective, there is an ongoing debate on the potential of genetic diagnostics in identification of individuals at elevated risk for specific complex diseases early in life. If successful, genetic diagnostics may inform selection of patients for early therapeutic intervention. However, the practical utility of genetics in evaluating risk of complex diseases has not been proven and is widely debated. We will rigorously test the hypothesis of the utility of genotype-based phenotypic predictions.
In Specific Aim 1 we will develop and test new statistical methods for predicting phenotypes from microarray genotyping data. We will investigate several model selection and shrinkage strategies. We will evaluate whether it is more efficient to estimate contributions of individual markers independently or to fit all markers simultaneously.
In Specific Aim 2 we will improve polygenic prediction in populations of diverse ancestry. It is important that medical progress not be limited to European populations. Our methods will generate predictions across human populations, accounting for population differences in allele frequencies, rates of allelic variation and patterns of linkage disequilibrium.
In Specific Aim 3 we will develop and test statistical methods for predicting phenotypes from sequencing data. Sequencing data provide a distinct set of statistical challenges because they contain low-frequency and rare allelic variants, and often the effects of individual rare variants cannot be estimated.
In Specific Aim 4 we will incorporate functional data into methods for phenotype prediction. We will investigate whether incorporation of functional data can improve phenotype predictions from genetic data.

Public Health Relevance

Deciphering DNA of individual human patients opens a perspective to predict genetic risk of common human diseases. We will develop statistical methods for predicting disease risk based on collective action of many genes. We will test feasibility of disease risk prediction given the rapidly growing amount of genetic and functional data.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Gusev, Alexander; Mancuso, Nicholas; Won, Hyejung et al. (2018) Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat Genet 50:538-548
Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433
Loh, Po-Ru; Kichaev, Gleb; Gazal, Steven et al. (2018) Mixed-model association for biobank-scale datasets. Nat Genet 50:906-908
Palamara, Pier Francesco; Terhorst, Jonathan; Song, Yun S et al. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 50:1311-1317
Loh, Po-Ru; Genovese, Giulio; Handsaker, Robert E et al. (2018) Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559:350-355
Pasaniuc, Bogdan; Price, Alkes L (2017) Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 18:117-127
Márquez-Luna, Carla; Loh, Po-Ru; South Asian Type 2 Diabetes (SAT2D) Consortium et al. (2017) Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 41:811-823
Sohail, Mashaal; Vakhrusheva, Olga A; Sul, Jae Hoon et al. (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356:539-542
Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé et al. (2017) Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am J Hum Genet 100:473-487
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605

Showing the most recent 10 out of 22 publications