Gene expression is an important molecular phenotype, providing the initial step in bridging the divide between static genomic information and dynamic organismal phenotypes. A nearly ubiquitous observation of studies performed to date is that cis-regulatory variation, primarily located in non-coding regions, is pervasive and a significant source of heritable gene expression variation. However, the functional consequences of non-coding variation have been difficult to assess on a genome-wide scale. Recently, digital DNAseI footprinting has emerged as a powerful approach to identify in vivo DNA-protein interactions. To this end, in Aim 1 we will leverage the power of digital DNAseI footprinting to systematically interrogate the functional significance of non-coding variation by developing a comprehensive and nucleotide level resolution map of in vivo DNA-protein interactions in 40 genetically diverse yeast strains and species (38 strains of Saccharomyces cerevisae, one strain of S. paradoxus, and one strain of S. bayanus). These data will yield fundamental insights into natural variation of in vivo protein binding site variation and the evolutionary forces shaping patterns of regulatory sequence variation within and between species.
In Aim 2, we will perform deep RNA-Seq on all 40 strains and species and correlate patterns of polymorphisms that lead to variation in in vivo DNA- protein interactions with gene expression levels, providing one of the largest compendiums of functional regulatory alleles generated to date. Importantly, we will also use this unique dataset to develop statistical methods for predicting functionally significant non-coding variation. The successful completion of the proposed project will provide the foundation for a more principled understanding of non-coding variation, facilitate the translation of static genomic information into predictive and quantitative models of transcript abundance, enable the interpretation of sequence variation in the context of personal genomics initiatives, and yield new insights into the evolution of gene expression levels.
Gene expression is an important step in the process of converting DNA sequence information into phenotypes, and disrupting how much or when a gene is made can lead to disease. The proposed project will develop important new experimental and statistical tools to understand mutations that influence gene expression levels, which will be critical for interpreting, understanding, and ameliorating human disease.