This project has three major components. Analysis of matched accessibility, expression and 3D contact data in diverse contexts In this component we will develop statistical methods for modeling gene regulatory relations based on the joint analysis of gene expression data (from RNA-seq), chromatin accessibility data (from ATAC-seq or DNase-seq) and 3D interaction data (from Hi-C or HiChIP) from diverse cellular contexts. The focus will on ?matched data sets? where the different types of omics assays are performed on the same or closely matched cellular contexts. The methodology should be applicable to cases when some data types are missing in a substantial subset of contexts. The methodology should also be able to incorporate a large variety of non-context dependent data. Analysis of matched omics data in time courses In this component we will develop an approach to the modeling of time course data that is capable of utilizing prior information provided by a general regulatory model learned from diverse contexts. The emphasis will be on experiments where expression, accessibility and 3D contact data are generated at each time points to study a specific biological processes. Joint analysis of multiple types of single cell omics data In this component we will develop statistical methods for the joint analysis of single cell omics data on expression, accessibility, and 3D contact. Specifically, we are interested in the situation where multiple types of single cell omics data are generated from the same heterogeneous population of cells. We will develop statistical methods to resolve the population into relevant subpopulations and to infer subpopulation-specific gene regulatory relations. We will also develop methods to handle the case when 3D contact data is from bulk sample.

Public Health Relevance

The goal of this research is to develop new statistical methods for the analysis of gene regulatory mechanisms based on genome-wide data on chromatin accessibility, chromatin 3D conformation and gene expression. We will consider situations when data is available in diverse cellular contexts or in time course experiments, and will develop methods for analyzing bulk data as well as single cell data. Results from this research will enable more effective analysis of the gene regulatory mechanisms relevant to development and disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG010359-02
Application #
10001015
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Gilchrist, Daniel A
Project Start
2019-08-22
Project End
2023-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Stanford University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94305