Thousands of genome-wide association studies link speci c diseases or complex phenotypes to single mutations in the human genome. But translating these results to medical treatments requires a precise understanding of how that mutation contributes to the mechanism of disease. Currently, the regulatory role of single nucleotide polymorphisms (SNPs) is, for the most part, con ned to local, or cis-, expression quantitative trait loci (eQTLs) in a small number of human tissues. But not all diseases or complex phenotypes are mediated by cis-eQTLs. Very few long-distance, or trans-, eQTLs have been identi ed and validated in human tissues, although trans-eQTLs play an important role in some complex phenotypes. Alternative splicing has also been shown to modulate certain phenotypes;however, little is known about SNPs that regulate alternative splicing. The proposed K99/R00 research seeks to design statistical methods that build gene and transcript networks to identify SNPs that regulate gene and mRNA isoform tran- scription, both locally and over long distances, and to validate those ndings, for the purpose of providing insight into mechanisms for complex phenotypes and disease. We propose to leverage cis-eQTLs and gene expression data in humans identi ed in our current work to build precise, directed gene networks on a genome-scale. We will build these networks using Bayesian statistical models to compute the probability of a particular network with respect to each gene in the network jointly, with associated eQTLs providing information about whether regulated genes are upstream or downstream of other network genes. We will use Markov chain Monte Carlo and linear programming relaxation methods that have been shown to nd near-optimal solutions to this type of problem. We will use these networks to identify trans-eQTLs, and quantify the e ect of each trans-eQTL in a particular process using Bayesian statistical tests developed in our lab. Subsequently, we propose to exploit the opportunities of novel RNA sequencing techniques and nonparametric statistical models to identify transcript isoforms for each transcribed gene and, simultaneously, individual-speci c transcript levels by extending sparse factor analysis models. This will enable us to identify QTLs that regulate the transcription of speci c transcript isoforms (tQTLs) via alternative splicing events by extending the methods we have for eQTL identi cation. We will use the methodology we developed for eQTLs to build networks for transcript isoforms (transcript networks ). Finally, we will use transcript networks to identify and quantify tQTLs that regulate individual-speci c levels of transcript isoforms both locally and over long genetic distances, as with eQTLs. We will make all of our methods and results publicly available.

Public Health Relevance

Thousands of genome-wide association studies link speci c diseases or complex traits to single mutations in the human genome, but these results cannot yet be translated to medical treatments because knowing that a mutation is associated with a disease does not, in fact, give us insight into how that mutation contributes to the mechanism of disease. Our proposed research will design and validate statistical methods that provide a comprehensive road map to understanding the biological role of the mutations that are identi ed in these association studies. With the role of thousands of possibly disease-related mutations in hand, researchers can begin to piece together the mechanism of a disease and translate their ndings into treatments for the disease much more quickly.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Transition Award (R00)
Project #
4R00HG006265-02
Application #
8520752
Study Section
Special Emphasis Panel (NSS)
Program Officer
Volpi, Simona
Project Start
2011-09-06
Project End
2015-06-30
Budget Start
2012-09-01
Budget End
2013-06-30
Support Year
2
Fiscal Year
2012
Total Cost
$249,000
Indirect Cost
$90,401
Name
Duke University
Department
None
Type
Schools of Medicine
DUNS #
044387793
City
Durham
State
NC
Country
United States
Zip Code
27705
van den Berg, Stéphanie M; de Moor, Marleen H M; Verweij, Karin J H et al. (2016) Meta-analysis of Genome-Wide Association Studies for Extraversion: Findings from the Genetics of Personality Consortium. Behav Genet 46:170-82
Mimno, David; Blei, David M; Engelhardt, Barbara E (2015) Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure. Proc Natl Acad Sci U S A 112:E3441-50
Genetics of Personality Consortium; de Moor, Marleen H M; van den Berg, Stéphanie M et al. (2015) Meta-analysis of Genome-wide Association Studies for Neuroticism, and the Polygenic Association With Major Depressive Disorder. JAMA Psychiatry 72:642-50
Zhang, Weiwei; Spector, Tim D; Deloukas, Panos et al. (2015) Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol 16:14
Hart, Amy B; Gamazon, Eric R; Engelhardt, Barbara E et al. (2014) Genetic variation associated with euphorigenic effects of d-amphetamine is associated with diminished risk for schizophrenia and attention deficit hyperactivity disorder. Proc Natl Acad Sci U S A 111:5968-73
Mangravite, Lara M; Engelhardt, Barbara E; Medina, Marisa W et al. (2013) A statin-dependent QTL for GATM expression is associated with statin-induced myopathy. Nature 502:377-80
Muratore, Kathryn E; Engelhardt, Barbara E; Srouji, John R et al. (2013) Molecular function prediction for a family exhibiting evolutionary tendencies toward substrate specificity swapping: recurrence of tyrosine aminotransferase activity in the Iα subfamily. Proteins 81:1593-609
Brown, Christopher D; Mangravite, Lara M; Engelhardt, Barbara E (2013) Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet 9:e1003649
Mordelet, Fantine; Horton, John; Hartemink, Alexander J et al. (2013) Stability selection for regression-based models of transcription factor-DNA binding specificity. Bioinformatics 29:i117-25
Hart, Amy B; Engelhardt, Barbara E; Wardle, Margaret C et al. (2012) Genome-wide association study of d-amphetamine response in healthy volunteers identifies putative associations, including cadherin 13 (CDH13). PLoS One 7:e42646