The Genotype-Tissue Expression (GTEx) program is investing considerable resources in the collection of a data set that promises to offer unprecedented opportunities to understand gene expression regulation in humans, its modulation across tissues, and its genetic determinants. It is easy to imagine that varied and far-reaching research programs will develop to interpret and fully understand the results of this investigation. The first step of all these studies is the identification of loci, referred to as expression quantitative trait loci, eQTL, tha influence the expression of one or more genes in one or more tissues. This proposal focuses on providing cutting edge computational and statistical tools that will allow identification of eQTL with high sensitivity and at a low false positive rate, maximizing the yield of the GTEx sample collection and experimental studies. We will tackle in turn 1) the development of powerful and sensitive tests; 2) the control of false positives using statistical methodology; and 3) the analyss of replicability and relevance of the results using independent data sets. Specifically, we will develop, implement in software to be freely distributed, and apply to GTEx data, test statistics that are robust to batch effect and population stratification, but adaptable to the coordinated analysis of expression in multiple tissues, allowing for the borrowing of information across tissues. We will establish thresholds for statistical significance of these test statistics using ad extending cutting-edge approaches that will be both mindful of the extremely large number of hypotheses explored and adaptive to the noise structure of the GTEx data. Finally, we will analyze three independent data sets that include genome-wide genotype and expression information as well as rich phenotypic data: this step will allow us to evaluate the reproducibilit of our results, as well as to investigate how the identified eQTLs relate to high-order phenotypes. Comprising investigators at Stanford and UCLA, our team has a record of collaboration and excellence in each of the areas that represent important challenges presented by this project.
The Genotype-Tissue Expression (GTEx) program is collecting a data set that promises to offer unprecedented opportunities to understand how the expression of different genes varies across tissues in the human body and how inter-individual DNA differences affect these expression patterns. Before the GTEx investment can come to fruition, a number of computational and statistical challenges must be addressed. Scientists must sort through enormous amounts of data, looking for connections between them, and evaluating their significance with reference to the large amounts of spurious associations that are to be expected with such a large exploration. This project will develop such computational and statistical approaches, validating their performance in an independent dataset.
Showing the most recent 10 out of 75 publications