We propose to develop novel computational methods to identify regulatory elements in repetitive regions of the genomes using publicly available data from Roadmap Epigenomics and ENCODE. Half of the human genome is derived from transposable elements (TEs). These highly repetitive elements were recently shown to harbor transcription factor (TF) binding sites and epigenetic regulatory signals. TEs have shaped gene regulatory networks during evolution and are dysregulated in many diseases. However, the extent to which TEs contribute to regulatory networks, and how TE sequences evolved from parasitic DNA to functional elements, remains unclear. In this proposal, we introduce a computational framework to identify TE-derived cell type-specific enhancers and to estimate the evolutionary impact of TEs to cell type-specific gene regulation.
In Specific Aim 1 we plan to develop an epigenomics-based approach to detect TE-derived enhancers and their target genes. Extending our recent success in developing machine learning methods to integrate DNA methylation data, we will bring to bear computational models that allow us to predict TE-derived enhancers. If successful, not only will we produce the largest catalog of TE-derived cell type-specific enhancers, but also have created a robust framework for detecting the contributions of TEs to gene regulation in any cell type or tissue.
In Specific Aim 2 we will develop a TE epigenetic association assay. By taking advantage of the multi-copy nature of TE sequences, we will identify TE sequence variations or features that associate with specific epigenetic and/or TF binding pattern. We will reconstruct sequences of the evolutionary intermediates of candidate TEs and estimate their epigenetic and/or TF binding pattern. We will address questions including whether particular classes of TEs gained TF-binding sites and then spread quickly, or whether TEs first spread and later gained TF binding sites. If successful, we will develop an understanding of what sequence features drive the functional potential of TEs, and the modes of evolution followed by different TEs during regulatory network evolution. Such an understanding will dramatically improve our picture of regulatory network evolution by including the effects of TEs, a major class of fast evolving regulatory sequences that have been largely ignored in functional genomics studies.
In Specific Aim 3 we will create a public resource based on our newly invented Repeat Element Browser to allow investigators to display, analyze, compare, and integrate Roadmap/ENCODE data and their own data on TEs. The methods developed in this proposal will have a high impact on the utility of data produced by consortia such as ENCODE, Roadmap Epigenomics, and TCGA, which currently discard most TE derived sequences. Such improvement will in turn accelerate research into understanding the impact of TEs on normal gene regulation and in human diseases.

Public Health Relevance

Transposable elements (TEs) are a special class of DNA sequences which copy themselves and hop to many different locations in the genome. TEs are often referred to as junk DNA or parasitic DNA, but they are more and more implicated in genome evolution, gene regulation and diseases. These elements comprise a huge fraction of the DNA in mammalian genomes, including 50% of the human genome. Because of their repetitive nature they are generally discarded in most genomics studies. We recently showed that these elements often carry regulatory sequences that are co-opted by host genomes to perform normal gene regulation. Here we propose to study the extent to which TEs contribute to normal gene regulation throughout the genome and how mis-regulation of TE derived sequences contributes to disease.

National Institute of Health (NIH)
National Institute of Environmental Health Sciences (NIEHS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-R (51))
Program Officer
Chadwick, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Washington University
Schools of Medicine
Saint Louis
United States
Zip Code
Gu, Junchen; Stevens, Michael; Xing, Xiaoyun et al. (2016) Mapping of Variable DNA Methylation Across Multiple Cell Types Defines a Dynamic Regulatory Landscape of the Human Genome. G3 (Bethesda) 6:973-86
Lowdon, Rebecca F; Jang, Hyo Sik; Wang, Ting (2016) Evolution of Epigenetic Regulation in Vertebrate Genomes. Trends Genet 32:269-83
Nelson, E C; Agrawal, A; Heath, A C et al. (2016) Evidence of CNIH3 involvement in opioid dependence. Mol Psychiatry 21:608-14
Carey, Caitlin E; Agrawal, Arpana; Zhang, Bo et al. (2015) Monoacylglycerol lipase (MGLL) polymorphism rs604300 interacts with childhood adversity to predict cannabis dependence symptoms and amygdala habituation: Evidence from an endocannabinoid system-level analysis. J Abnorm Psychol 124:860-77
Fonseca, Tatiana L; Fernandes, Gustavo W; McAninch, Elizabeth A et al. (2015) Perinatal deiodinase 2 expression in hepatocytes defines epigenetic susceptibility to liver steatosis and obesity. Proc Natl Acad Sci U S A 112:14018-23
Gascard, Philippe; Bilenky, Misha; Sigaroudinia, Mahvash et al. (2015) Epigenetic and transcriptional determinants of the human breast. Nat Commun 6:6351
Zhou, Xin; Li, Daofeng; Zhang, Bo et al. (2015) Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser. Nat Biotechnol 33:345-6
Lee, Hyung Joo; Lowdon, Rebecca F; Maricque, Brett et al. (2015) Developmental enhancers revealed by extensive DNA methylome maps of zebrafish early embryos. Nat Commun 6:6315
Li, Daofeng; Zhang, Bo; Xing, Xiaoyun et al. (2015) Combining MeDIP-seq and MRE-seq to investigate genome-wide CpG methylation. Methods 72:29-40
Hochner, Hagit; Allard, Catherine; Granot-Hershkovitz, Einat et al. (2015) Parent-of-Origin Effects of the APOB Gene on Adiposity in Young Adults. PLoS Genet 11:e1005573

Showing the most recent 10 out of 18 publications