Improving Polygenic Prediction using Next-Generation Data Sets

Sunyaev, Shamil

Abstract

Understanding the relationship between genotype and phenotype is the central goal of genetics. Available heritability estimates for many human traits of medical relevance suggest that 30-80% of phenotypic variation is due to underlying genetic variation. The ability to predict phenotypes based on genotypes is the ultimate test of our understanding of complex trait genetics. Since the dawn of complex trait genetics in the early 20th century, progress has been limited by the availability of genetic data in well-phenotyped populations. Now, due to the extraordinary progress in technology, microarray genotyping datasets, exome sequencing datasets and targeted sequencing datasets are available for large clinically phenotyped populations, and functional data is becoming available. A future explosion of whole-genome sequencing data is also widely anticipated. This shifts the focus from data acquisition to data interpretation and development of computational and statistical methods for predicting phenotypes from genotypes and functional information. We propose to develop new methods for predicting phenotypes from genotypes and apply these methods to newly collected data on human complex traits of direct medical interest, including both quantitative and disease traits. Our work on phenotype prediction will be informative about the allelic architecture of complex traits and will provide guidance for future genetic studies. From a practical perspective, there is an ongoing debate on the potential of genetic diagnostics in identification of individuals at elevated risk for specific complex diseases early in life. If successful, genetic diagnostics may inform selection of patients for early therapeutic intervention. However, the practical utility of genetics in evaluating risk of complex diseases has not been proven and is widely debated. We will rigorously test the hypothesis of the utility of genotype-based phenotypic predictions.
In Specific Aim 1 we will develop and test new statistical methods for predicting phenotypes from microarray genotyping data. We will investigate several model selection and shrinkage strategies. We will evaluate whether it is more efficient to estimate contributions of individual markers independently or to fit all markers simultaneously.
In Specific Aim 2 we will improve polygenic prediction in populations of diverse ancestry. It is important that medical progress not be limited to European populations. Our methods will generate predictions across human populations, accounting for population differences in allele frequencies, rates of allelic variation and patterns of linkage disequilibriu.
In Specific Aim 3 we will develop and test statistical methods for predicting phenotypes from sequencing data. Sequencing data provide a distinct set of statistical challenges because they contain low-frequency and rare allelic variants, and often the effects of individual rare variants cannot be estimated.
In Specific Aim 4 we will incorporate functional data into methods for phenotype prediction. We will investigate whether incorporation of functional data can improve phenotype predictions from genetic data.

Public Health Relevance

Deciphering DNA of individual human patients opens a perspective to predict genetic risk of common human diseases. We will develop statistical methods for predicting disease risk based on collective action of many genes. We will test feasibility of disease risk prediction given the rapidly growing amount of genetic and functional data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM105857-04
Application #: 9245712
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Krasnewich, Donna M

Project Start: 2014-06-15
Project End: 2018-02-28
Budget Start: 2017-03-01
Budget End: 2018-02-28
Support Year: 4
Fiscal Year: 2017
Total Cost: $442,450
Indirect Cost: $81,819

Institution

Name: Brigham and Women's Hospital
Department
Type: Independent Hospitals
DUNS #: 030811269

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2017 R01 GM	Improving Polygenic Prediction using Next-Generation Data Sets Sunyaev, Shamil / Brigham and Women's Hospital	$442,450
NIH 2016 R01 GM	Improving Polygenic Prediction using Next-Generation Data Sets Sunyaev, Shamil / Brigham and Women's Hospital	$491,613
NIH 2015 R01 GM	Improving Polygenic Prediction using Next-Generation Data Sets Sunyaev, Shamil / Brigham and Women's Hospital	$491,613
NIH 2014 R01 GM	Improving Polygenic Prediction using Next-Generation Data Sets Sunyaev, Shamil / Brigham and Women's Hospital	$543,271

Publications

Gusev, Alexander; Mancuso, Nicholas; Won, Hyejung et al. (2018) Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat Genet 50:538-548

Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433

Loh, Po-Ru; Kichaev, Gleb; Gazal, Steven et al. (2018) Mixed-model association for biobank-scale datasets. Nat Genet 50:906-908

Palamara, Pier Francesco; Terhorst, Jonathan; Song, Yun S et al. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 50:1311-1317

Loh, Po-Ru; Genovese, Giulio; Handsaker, Robert E et al. (2018) Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559:350-355

Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé et al. (2017) Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am J Hum Genet 100:473-487

Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605

Pasaniuc, Bogdan; Price, Alkes L (2017) Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 18:117-127

Márquez-Luna, Carla; Loh, Po-Ru; South Asian Type 2 Diabetes (SAT2D) Consortium et al. (2017) Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 41:811-823

Sohail, Mashaal; Vakhrusheva, Olga A; Sul, Jae Hoon et al. (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356:539-542

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on Shamil Sunyaev's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: