The study of gene regulation is increasingly informed by the variation within and between species: by determining how sequence variants alter expression, we come closer to understanding how regulatory information is encoded. Expression variation is typically driven by both cis-acting changes to regulatory sequences and trans-acting changes to the concentration or activity of the proteins that bind them. Both of these effects are integrated at the level of protein-DNA binding. This binding can be quantified by ChIP-seq, but previous approaches to this problem have failed to differentiate cis-and trans-driven binding variation, blurring their separate impacts and complicating the discovery of their causal sequence variants. Furthermore, ChIP-seq suffers from low resolution, making it difficult to resolve the impact of individual binding sites, and is intrinsically limited to small numbers of factors, restricting its scope and making it difficult to study how factors interact. We propose t address these shortcomings by collecting quantitative, high resolution, and comprehensive binding data in three closely-related Drosophila species and their hybrids, centering this work around the transcription factor Mef2. Mef2 binds a wide range of targets, is unusually tractable, and we have found that it exhibits expression variation between our target species, potentially acting as a source of trans-acting variation. We will use ChIP-seq in interspecies hybrids to assess allele-specific binding of Mef2, allowing us to separate the contributions of cis- and trans-acting sequence variation to each binding event. We will observe how this variation acts upon individual sites by employing a recently described modification of the ChIP-seq protocol that achieves base-pair resolution. Finally, we will explore two complementary experimental methods that utilize new analysis tools to generate comprehensive protein-DNA binding data. We will align these sources of data with matching RNA-seq data already generated by the Wittkopp lab, allowing us to deploy a series of statistical models to address several fundamental questions in the study of gene regulation, each either for the first time or to substantially greatr depth. By better describing the link between sequence and expression, this work will contribute to the discovery of sequence variation underlying the characteristic misexpression patterns of different cancers and genetic diseases, helping to illuminate their root genetic causes. In particular, sequence variants creating binding sites for Mef2 have been implicated in lung cancer risk, and this work will help illustrate their relationship with tumorigenesis.
Many cancers and inherited diseases are thought to be caused by problems in the cryptic instructions in cells'DNA that dictate where, when, and in what quantity to produce each cell's basic building blocks. If we could better understand these instructions, then we could predict the root causes of these diseases, spurring the generation of new therapies. This project will take advantage of new techniques to address several fundamental questions regarding these instructions'nature.
|Lusk, Richard W (2014) Diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data. PLoS One 9:e110808|