The Genotype-Tissue Expression (GTEx) program is investing considerable resources in the collection of a data set that promises to offer unprecedented opportunities to understand gene expression regulation in humans, its modulation across tissues, and its genetic determinants. It is easy to imagine that varied and far-reaching research programs will develop to interpret and fully understand the results of this investigation. The first step of all these studies is the identification of loci, referred to as expression quantitative trait loci, eQTL, tha influence the expression of one or more genes in one or more tissues. This proposal focuses on providing cutting edge computational and statistical tools that will allow identification of eQTL with high sensitivity and at a low false positive rate, maximizing the yield of the GTEx sample collection and experimental studies. We will tackle in turn 1) the development of powerful and sensitive tests;2) the control of false positives using statistical methodology;and 3) the analyss of replicability and relevance of the results using independent data sets. Specifically, we will develop, implement in software to be freely distributed, and apply to GTEx data, test statistics that are robust to batch effect and population stratification, but adaptable to the coordinated analysis of expression in multiple tissues, allowing for the borrowing of information across tissues. We will establish thresholds for statistical significance of these test statistics using ad extending cutting-edge approaches that will be both mindful of the extremely large number of hypotheses explored and adaptive to the noise structure of the GTEx data. Finally, we will analyze three independent data sets that include genome-wide genotype and expression information as well as rich phenotypic data: this step will allow us to evaluate the reproducibilit of our results, as well as to investigate how the identified eQTLs relate to high-order phenotypes. Comprising investigators at Stanford and UCLA, our team has a record of collaboration and excellence in each of the areas that represent important challenges presented by this project.

Public Health Relevance

The Genotype-Tissue Expression (GTEx) program is collecting a data set that promises to offer unprecedented opportunities to understand how the expression of different genes varies across tissues in the human body and how inter-individual DNA differences affect these expression patterns. Before the GTEx investment can come to fruition, a number of computational and statistical challenges must be addressed. Scientists must sort through enormous amounts of data, looking for connections between them, and evaluating their significance with reference to the large amounts of spurious associations that are to be expected with such a large exploration. This project will develop such computational and statistical approaches, validating their performance in an independent dataset.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Addington, Anjene M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Hormozdiari, Farhad; Gazal, Steven; van de Geijn, Bryce et al. (2018) Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat Genet 50:1041-1047
Zhang, Mingfeng; Lykke-Andersen, Soren; Zhu, Bin et al. (2018) Characterising cis-regulatory variation in the transcriptome of histologically normal and tumour-derived pancreatic tissues. Gut 67:521-533
Olde Loohuis, Loes M; Mangul, Serghei; Ori, Anil P S et al. (2018) Transcriptome analysis in whole blood reveals increased microbial diversity in schizophrenia. Transl Psychiatry 8:96
Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas et al. (2018) ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Genome Biol 19:36
Agrawal, A; Chou, Y-L; Carey, C E et al. (2018) Genome-wide association study identifies a novel locus for cannabis dependence. Mol Psychiatry 23:1293-1302
Gai, Lisa; Eskin, Eleazar (2018) Finding associated variants in genome-wide association studies on multiple traits. Bioinformatics 34:i467-i474
Rahmani, Elior; Schweiger, Regev; Shenhav, Liat et al. (2018) BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol 19:141
Kang, Eun Yong; Lee, Cue Hyunkyu; Furlotte, Nicholas A et al. (2018) An Association Mapping Framework To Account for Potential Sex Difference in Genetic Architectures. Genetics 209:685-698
Gamazon, Eric R; Segrè, Ayellet V; van de Bunt, Martijn et al. (2018) Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat Genet 50:956-967
Collado-Torres, Leonardo; Nellore, Abhinav; Kammers, Kai et al. (2017) Reproducible RNA-seq analysis using recount2. Nat Biotechnol 35:319-321

Showing the most recent 10 out of 75 publications