The Genotype-Tissue Expression (GTEx) program is investing considerable resources in the collection of a data set that promises to offer unprecedented opportunities to understand gene expression regulation in humans, its modulation across tissues, and its genetic determinants. It is easy to imagine that varied and far-reaching research programs will develop to interpret and fully understand the results of this investigation. The first step of all these studies is the identification of loci, referred to as expression quantitative trait loci, eQTL, tha influence the expression of one or more genes in one or more tissues. This proposal focuses on providing cutting edge computational and statistical tools that will allow identification of eQTL with high sensitivity and at a low false positive rate, maximizing the yield of the GTEx sample collection and experimental studies. We will tackle in turn 1) the development of powerful and sensitive tests;2) the control of false positives using statistical methodology;and 3) the analyss of replicability and relevance of the results using independent data sets. Specifically, we will develop, implement in software to be freely distributed, and apply to GTEx data, test statistics that are robust to batch effect and population stratification, but adaptable to the coordinated analysis of expression in multiple tissues, allowing for the borrowing of information across tissues. We will establish thresholds for statistical significance of these test statistics using ad extending cutting-edge approaches that will be both mindful of the extremely large number of hypotheses explored and adaptive to the noise structure of the GTEx data. Finally, we will analyze three independent data sets that include genome-wide genotype and expression information as well as rich phenotypic data: this step will allow us to evaluate the reproducibilit of our results, as well as to investigate how the identified eQTLs relate to high-order phenotypes. Comprising investigators at Stanford and UCLA, our team has a record of collaboration and excellence in each of the areas that represent important challenges presented by this project.

Public Health Relevance

The Genotype-Tissue Expression (GTEx) program is collecting a data set that promises to offer unprecedented opportunities to understand how the expression of different genes varies across tissues in the human body and how inter-individual DNA differences affect these expression patterns. Before the GTEx investment can come to fruition, a number of computational and statistical challenges must be addressed. Scientists must sort through enormous amounts of data, looking for connections between them, and evaluating their significance with reference to the large amounts of spurious associations that are to be expected with such a large exploration. This project will develop such computational and statistical approaches, validating their performance in an independent dataset.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-H (50))
Program Officer
Addington, Anjene M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Schweiger, Regev; Kaufman, Shachar; Laaksonen, Reijo et al. (2016) Fast and Accurate Construction of Confidence Intervals for Heritability. Am J Hum Genet 98:1181-92
Duong, Dat; Zou, Jennifer; Hormozdiari, Farhad et al. (2016) Using genomic annotations increases statistical power to detect eGenes. Bioinformatics 32:i156-i163
Hormozdiari, Farhad; Kang, Eun Yong; Bilow, Michael et al. (2016) Imputing Phenotypes for Genome-wide Association Studies. Am J Hum Genet 99:89-103
Benitez-Buelga, Carlos; Vaclová, Tereza; Ferreira, Sofia et al. (2016) Molecular insights into the OGG1 gene, a cancer risk modifier in BRCA1 and BRCA2 mutations carriers. Oncotarget 7:25815-25
Hartmann, Katherine; Seweryn, Michał; Handleman, Samuel K et al. (2016) Non-linear interactions between candidate genes of myocardial infarction revealed in mRNA expression profiles. BMC Genomics 17:738
Won, Hyejung; de la Torre-Ubieta, Luis; Stein, Jason L et al. (2016) Chromosome conformation elucidates regulatory relationships in developing human brain. Nature 538:523-527
Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun et al. (2016) Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models. PLoS Genet 12:e1005849
Peterson, C B; Bogomolov, M; Benjamini, Y et al. (2016) TreeQTL: hierarchical error control for eQTL findings. Bioinformatics 32:2556-8
Hou, Liping; Bergen, Sarah E; Akula, Nirmala et al. (2016) Genome-wide association study of 40,000 individuals identifies two novel loci associated with bipolar disorder. Hum Mol Genet 25:3383-3394
Kang, Eun Yong; Martin, Lisa; Mangul, Serghei et al. (2016) Discovering SNPs Regulating Human Gene Expression Using Allele Specific Expression from RNA-Seq Data. Genetics :

Showing the most recent 10 out of 37 publications