This project has three major components. Analysis of matched accessibility, expression and 3D contact data in diverse contexts In this component we will develop statistical methods for modeling gene regulatory relations based on the joint analysis of gene expression data (from RNA-seq), chromatin accessibility data (from ATAC-seq or DNase-seq) and 3D interaction data (from Hi-C or HiChIP) from diverse cellular contexts. The focus will on ?matched data sets? where the different types of omics assays are performed on the same or closely matched cellular contexts. The methodology should be applicable to cases when some data types are missing in a substantial subset of contexts. The methodology should also be able to incorporate a large variety of non-context dependent data. Analysis of matched omics data in time courses In this component we will develop an approach to the modeling of time course data that is capable of utilizing prior information provided by a general regulatory model learned from diverse contexts. The emphasis will be on experiments where expression, accessibility and 3D contact data are generated at each time points to study a specific biological processes. Joint analysis of multiple types of single cell omics data In this component we will develop statistical methods for the joint analysis of single cell omics data on expression, accessibility, and 3D contact. Specifically, we are interested in the situation where multiple types of single cell omics data are generated from the same heterogeneous population of cells. We will develop statistical methods to resolve the population into relevant subpopulations and to infer subpopulation-specific gene regulatory relations. We will also develop methods to handle the case when 3D contact data is from bulk sample.
The goal of this research is to develop new statistical methods for the analysis of gene regulatory mechanisms based on genome-wide data on chromatin accessibility, chromatin 3D conformation and gene expression. We will consider situations when data is available in diverse cellular contexts or in time course experiments, and will develop methods for analyzing bulk data as well as single cell data. Results from this research will enable more effective analysis of the gene regulatory mechanisms relevant to development and disease.