Gene expression is an important molecular phenotype, providing the initial step in bridging the divide between static genomic information and dynamic organismal phenotypes. A nearly ubiquitous observation of studies performed to date is that cis-regulatory variation, primarily located in non-coding regions, is pervasive and a significant source of heritable gene expression variation. However, the functional consequences of non-coding variation have been difficult to assess on a genome-wide scale. Recently, digital DNAseI footprinting has emerged as a powerful approach to identify in vivo DNA-protein interactions. To this end, in Aim 1 we will leverage the power of digital DNAseI footprinting to systematically interrogate the functional significance of non-coding variation by developing a comprehensive and nucleotide level resolution map of in vivo DNA-protein interactions in 40 genetically diverse yeast strains and species (38 strains of Saccharomyces cerevisae, one strain of S. paradoxus, and one strain of S. bayanus). These data will yield fundamental insights into natural variation of in vivo protein binding site variation and the evolutionary forces shaping patterns of regulatory sequence variation within and between species.
In Aim 2, we will perform deep RNA-Seq on all 40 strains and species and correlate patterns of polymorphisms that lead to variation in in vivo DNA- protein interactions with gene expression levels, providing one of the largest compendiums of functional regulatory alleles generated to date. Importantly, we will also use this unique dataset to develop statistical methods for predicting functionally significant non-coding variation. The successful completion of the proposed project will provide the foundation for a more principled understanding of non-coding variation, facilitate the translation of static genomic information into predictive and quantitative models of transcript abundance, enable the interpretation of sequence variation in the context of personal genomics initiatives, and yield new insights into the evolution of gene expression levels.

Public Health Relevance

Gene expression is an important step in the process of converting DNA sequence information into phenotypes, and disrupting how much or when a gene is made can lead to disease. The proposed project will develop important new experimental and statistical tools to understand mutations that influence gene expression levels, which will be critical for interpreting, understanding, and ameliorating human disease.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Connelly, Caitlin F; Wakefield, Jon; Akey, Joshua M (2014) Evolution and genetic architecture of chromatin accessibility and function in yeast. PLoS Genet 10:e1004427
Fu, Wenqing; Akey, Joshua M (2013) Selection and adaptation in the human genome. Annu Rev Genomics Hum Genet 14:467-89
Connelly, Caitlin F; Skelly, Daniel A; Dunham, Maitreya J et al. (2013) Population genomics and transcriptional consequences of regulatory motif variation in globally diverse Saccharomyces cerevisiae strains. Mol Biol Evol 30:1605-13
Skelly, Daniel A; Merrihew, Gennifer E; Riffle, Michael et al. (2013) Integrative phenomics reveals insight into the structure of phenotypic diversity in budding yeast. Genome Res 23:1496-504