The study of gene regulation is increasingly informed by the variation within and between species: by determining how sequence variants alter expression, we come closer to understanding how regulatory information is encoded. Expression variation is typically driven by both cis-acting changes to regulatory sequences and trans-acting changes to the concentration or activity of the proteins that bind them. Both of these effects are integrated at the level of protein-DNA binding. This binding can be quantified by ChIP-seq, but previous approaches to this problem have failed to differentiate cis-and trans-driven binding variation, blurring their separate impacts and complicating the discovery of their causal sequence variants. Furthermore, ChIP-seq suffers from low resolution, making it difficult to resolve the impact of individual binding sites, and is intrinsically limited to small numbers of factors, restricting its scope and making it difficult to study how factors interact. We propose t address these shortcomings by collecting quantitative, high resolution, and comprehensive binding data in three closely-related Drosophila species and their hybrids, centering this work around the transcription factor Mef2. Mef2 binds a wide range of targets, is unusually tractable, and we have found that it exhibits expression variation between our target species, potentially acting as a source of trans-acting variation. We will use ChIP-seq in interspecies hybrids to assess allele-specific binding of Mef2, allowing us to separate the contributions of cis- and trans-acting sequence variation to each binding event. We will observe how this variation acts upon individual sites by employing a recently described modification of the ChIP-seq protocol that achieves base-pair resolution. Finally, we will explore two complementary experimental methods that utilize new analysis tools to generate comprehensive protein-DNA binding data. We will align these sources of data with matching RNA-seq data already generated by the Wittkopp lab, allowing us to deploy a series of statistical models to address several fundamental questions in the study of gene regulation, each either for the first time or to substantially greatr depth. By better describing the link between sequence and expression, this work will contribute to the discovery of sequence variation underlying the characteristic misexpression patterns of different cancers and genetic diseases, helping to illuminate their root genetic causes. In particular, sequence variants creating binding sites for Mef2 have been implicated in lung cancer risk, and this work will help illustrate their relationship with tumorigenesis.

Public Health Relevance

Many cancers and inherited diseases are thought to be caused by problems in the cryptic instructions in cells'DNA that dictate where, when, and in what quantity to produce each cell's basic building blocks. If we could better understand these instructions, then we could predict the root causes of these diseases, spurring the generation of new therapies. This project will take advantage of new techniques to address several fundamental questions regarding these instructions'nature.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Postdoctoral Individual National Research Service Award (F32)
Project #
5F32GM100685-02
Application #
8580526
Study Section
Special Emphasis Panel (ZRG1-F08-Q (20))
Program Officer
Reddy, Michael K
Project Start
2012-12-01
Project End
2014-11-30
Budget Start
2013-12-01
Budget End
2014-11-30
Support Year
2
Fiscal Year
2014
Total Cost
$52,190
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109