The three-dimensional organization of the genome is a major player in long-range gene regulation, where regulatory elements such as enhancers affect the expression of a gene hundreds of kilobases away. Changes in three-dimensional organization are associated with tissue-specific gene expression and have been implicated in several human diseases including cancer, diabetes and obesity. Advances in chromosome conformation capture (3C) technologies have expanded our repertoire of long-range interactions between enhancers and promoters in model cell lines and have shown that such interactions are established through a complex interplay of chromatin state, transcription factor binding and three-dimensional proximity of genomic regions. However, our current understanding of the dynamics of long-range gene regulation is limited, both across different cell types as well as across different species. This is because of the absence of such datasets in most species and cell types, lack of systematic methods to predict and interpret these interactions, and due to limited approaches to compare both the regions and their interactions across different cell types and especially across species. The overarching goals of this proposal are to develop novel computational methods to jointly identify candidate regulatory elements in multiple species and predict their long- range interactions in new cell types and species where high-throughput 3C datasets are not available or difficult to obtain.
In Aim 1, we will develop a phylogenetically aware method of jointly identifying regulatory elements such as enhancers in multiple species.
Aim 2 will develop multi-task and transfer learning approaches to predict interactions in new species and cell types by integrating available high-throughput 3C datasets from multiple cell types and 3C platforms.
In Aim 3, we will collect a novel multi-species chromatin mark dataset in species-specific endothelial cells to enable a systematic study of long-range gene regulation dynamics. We will apply our computational approaches developed in Aims 1 and 2 on this multi-species epigenomic dataset to identify different regulatory elements and predict long-range interactions in multiple species. We will develop rigorous computational measures to evaluate the quality of predictions from our novel methods and the improvements compared to existing methods based on published 3C datasets. We will further experimentally validate predicted interactions using Capture-HiC in multiple species and using CRISPR/Cas9 experiments. We will examine individual and groups of interactions to identify species-specific, and clade- specific interactions and interpret the corresponding genes in the context of known pathways and curated gene sets associated with cardiovascular diseases. Our methods will be widely applicable to dissect long-range gene regulation in complex phenotypes including diseases. Software tools, resources, original data and experimental protocols developed by this project will be made publicly available.

Public Health Relevance

Long-range gene regulatory interactions are emerging as important determinants of tissue-specific gene expression and are often disrupted in different diseases including cancer. Such interactions occur between distally located regulatory sequence elements and genes hundreds of kilobases away. Currently our understanding of long-range gene regulation is limited to a few cell types and model organisms. Computational methods to identify regulatory elements and systematically link them to target genes in diverse cell types and mammalian species can significantly improve our understanding of the impact of long-range gene regulation in human diseases, help interpret regulatory variation in non-coding parts of the genome and assist in the development of better biomarkers.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG010045-03
Application #
9996761
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Gilchrist, Daniel A
Project Start
2018-09-17
Project End
2022-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Wisconsin Madison
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
161202122
City
Madison
State
WI
Country
United States
Zip Code
53715