Repetitive transposable elements (TEs) comprise over 50% of the human genome. While some investigators regard TEs as ?parasitic? DNA, other studies suggest that TEs play a more constructive role in genome evolution by providing raw material for new biological functions. For example, TEs commonly harbor active cis-regulatory elements that are occasionally co-opted during evolution to wire new gene regulatory networks. While investigators now recognize the importance of TEs in gene regulation, TEs remain under-analyzed in high-throughput data because of methodological hurdles associated with their repetitive nature. Thus, the impact of TEs on the regulation of the human genome, both in normal development and disease, remains largely uncharacterized. We propose to develop novel computational methods to assess and clarify the impact of TEs in regulatory innovation using ENCODE data.
In Specific Aim 1 we will develop new algorithms and statistical methods to predict active regulatory elements encoded by TEs from heterogeneous ENCODE data. If successful, we will generate a profile of TE-derived regulatory elements and their predicted targets across diverse cell/tissue types and developmental stages, revealing new gene regulatory networks wired by TEs. With these new methods we also intend to examine the extent of TE dysregulation in cancer cells and its transcriptional consequences.
In Specific Aim 2 we will extend the models developed in Aim 1 to understand the role of TEs in shaping the 3D topology of the genome, which is intimately connected to genome function. We will investigate the role of TEs in partitioning the genome into chromosomal domains that orchestrate communication between cis-regulatory elements and their target genes. In particular, we will quantify the extent to which TEs drive conservation and divergence in genome topology across mammal species.
In Specific Aim 3 we will take advantage of the repetitive nature of TEs to develop a novel statistical model that links sequence changes in different copies of TEs to epigenetic and functional differences. The numerous, but slightly different copies of a TE present in a single genome provide a unique opportunity to identify sequence variants that underlie epigenetic modification, which will further our understanding of how TEs become co-opted for host gene regulation. Finally, in Specific Aim 4, we will deploy our recently developed Repeat Element Browser as a web portal and downloadable application specifically tailored for investigators to analyze, visualize and explore data produced by ENCODE, others, and their own data in the context of TEs. The methods developed in this proposal will have a high impact on the utility of the data produced by ENCODE and will greatly expand our understanding of the contribution of TEs to non-coding regulatory elements in healthy tissues and disease.
Transposable elements (TE) are a special class of short DNA sequences that copy and paste themselves to new locations in the genome. Through repeated copying and pasting, TEs now comprise over 50% of the human genome sequence. When TEs paste themselves near genes they can have profound effects on the way those genes are regulated, both in health and disease. Despite their importance TEs remain poorly characterized. The same property that makes them special, namely their ability to copy and paste across the genome, makes them highly repetitive and therefore recalcitrant to large-scale analyses, such as the ENCODE project. To address this problem we propose to develop a new set of computational methods to profile the regulatory activity of human TEs across anatomical and developmental space, taking advantage of comparisons between human and mouse to study the impact of TEs in regulatory evolution. We will use this comprehensive profile generated from healthy cells and tissues to identify TE mis-regulation in disease, including cancer, and its regulatory consequences.
Showing the most recent 10 out of 20 publications