Genome-wide analyses of associations between genetic variation and quantitative traits such as gene expression improve our understanding the role of genetic variation in common diseases. Population structure is a confounding factor in observations of the associations between genotype and phenotypes such as gene expression. The long- term goal of this research training proposal is to develop statistical methodology to model the population structure shared between genotype and phenotype using statistical techniques developed for probabilistic graphical models. We will apply our methodology to the Common Fund Genotype-Tissue Expression (GTEx) dataset, producing observations of the background correlations between genetic variation and expression due to the shared population structure and providing expression quantitative trait loci (eQTLs) that are statistically significat compared to the background expectation. By providing a better understanding of the effects of population structure on association between the genome and gene expression, we will improve our understanding of the genetic basis of complex common human diseases.
Understanding the fundamental relationship between variation in the human genome and human health and disease is a primary goal of biology and medicine. The relationship between the human genome and disease is obscured by the natural variation in the genome due to population differences. We hope to improve our understanding of the effects of these population differences on the relationships we observe between variation in the human genome and disease.