The Genotype-Tissue Expression (GTEx) program is investing considerable resources in the collection of a data set that promises to offer unprecedented opportunities to understand gene expression regulation in humans, its modulation across tissues, and its genetic determinants. It is easy to imagine that varied and far-reaching research programs will develop to interpret and fully understand the results of this investigation. The first step of all these studies is the identification of loci, referred to as expression quantitative trait loci, eQTL, tha influence the expression of one or more genes in one or more tissues. This proposal focuses on providing cutting edge computational and statistical tools that will allow identification of eQTL with high sensitivity and at a low false positive rate, maximizing the yield of the GTEx sample collection and experimental studies. We will tackle in turn 1) the development of powerful and sensitive tests;2) the control of false positives using statistical methodology;and 3) the analyss of replicability and relevance of the results using independent data sets. Specifically, we will develop, implement in software to be freely distributed, and apply to GTEx data, test statistics that are robust to batch effect and population stratification, but adaptable to the coordinated analysis of expression in multiple tissues, allowing for the borrowing of information across tissues. We will establish thresholds for statistical significance of these test statistics using ad extending cutting-edge approaches that will be both mindful of the extremely large number of hypotheses explored and adaptive to the noise structure of the GTEx data. Finally, we will analyze three independent data sets that include genome-wide genotype and expression information as well as rich phenotypic data: this step will allow us to evaluate the reproducibilit of our results, as well as to investigate how the identified eQTLs relate to high-order phenotypes. Comprising investigators at Stanford and UCLA, our team has a record of collaboration and excellence in each of the areas that represent important challenges presented by this project.
The Genotype-Tissue Expression (GTEx) program is collecting a data set that promises to offer unprecedented opportunities to understand how the expression of different genes varies across tissues in the human body and how inter-individual DNA differences affect these expression patterns. Before the GTEx investment can come to fruition, a number of computational and statistical challenges must be addressed. Scientists must sort through enormous amounts of data, looking for connections between them, and evaluating their significance with reference to the large amounts of spurious associations that are to be expected with such a large exploration. This project will develop such computational and statistical approaches, validating their performance in an independent dataset.
|Mangul, Serghei; Caciula, Adrian; Al Seesi, Sahar et al. (2014) Transcriptome assembly and quantification from Ion Torrent RNA-Seq data. BMC Genomics 15 Suppl 5:S7|
|Hormozdiari, Farhad; Kostem, Emrah; Kang, Eun Yong et al. (2014) Identifying causal variants at loci with multiple signals of association. Genetics 198:497-508|
|Hasin-Brumshtein, Yehudit; Hormozdiari, Farhad; Martin, Lisa et al. (2014) Allele-specific expression and eQTL analysis in mouse adipose tissue. BMC Genomics 15:471|
|Mangul, Serghei; Wu, Nicholas C; Mancuso, Nicholas et al. (2014) Accurate viral population assembly from ultra-deep sequencing data. Bioinformatics 30:i329-37|
|Kang, Eun Yong; Han, Buhm; Furlotte, Nicholas et al. (2014) Meta-analysis identifies gene-by-environment interactions as demonstrated in a study of 4,965 mice. PLoS Genet 10:e1004022|
|Joo, Jong Wha J; Sul, Jae Hoon; Han, Buhm et al. (2014) Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome Biol 15:r61|