A fundamental mystery in genome biology is how the three billion base pairs of a mammalian DNA sequence (approximately 2 meters long) are folded, looped, and coiled to fit into a cell nucleus that is roughly 5-10 microns in diameter. Rapid progress has been made over the last few years in advancing our understanding of how the genome folds in three dimensions (3D), primarily driven by advances in sequencing technologies. The goal of this project is to develop mathematical models that will provide a deeper understanding of how 3D genome structure is connected to gene expression in healthy development, and how these folding patterns go awry during the onset and progression of disease.

This project aims to shed new light into the organizing principles governing genome folding through chromatin conformation capture experiments. To date, no clear best practice computational methods exist for the comparison of genome organization across cell types or biological perturbations. The aim of this project is to develop mathematical models and computational methods to gain new insight into how the genetic material folds in different cellular states, and to sensitively detect how these folding patterns are dynamically altered by biological perturbations such as drugs, growth factors, and genome editing. This project focuses on developing methods to sensitively detect dynamic changes in two broad categories of 3D chromatin features: (1) sub-megabase topologically associating domains exhibiting a block structure, and (2) precise long-range interactions between two distant genomic loci, leading to looping out of the intervening genomic DNA. Both parametric and non-parametric normalization approaches for elucidating these features will be explored and benchmarked. Models for these features will be developed, leading to scan statistics for identifying them in normalized 3D contact maps. Methods for false discovery rate control for these scan statistics will be developed based on analysis of heterogeneous Poisson fields.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1562665
Program Officer
Pedro Embid
Project Start
Project End
Budget Start
2016-05-01
Budget End
2021-04-30
Support Year
Fiscal Year
2015
Total Cost
$1,395,955
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104