The deluge of sequencing-based functional genomic data profiling the transcriptome, regulome and epigenome in hundreds of diverse cellular contexts has spurred the development of powerful computational methods to learn integrative models of gene regulation. However, models learned on data from static cellular contexts only reveal correlative regulatory relationships. There is a paucity of tractable model systems and experiments profiling dynamic cellular processes and a corresponding lack of computational methods that can learn putative causal mechanisms controlling the precise timing and temporal order of changes in genomic chromatin state and gene expression. Here, we propose novel machine learning methods to learn dynamic models of transcription regulation in the context of cellular reprogramming.
In Aim 1, we propose deep learning frameworks with new interpretation engines that can integrate dynamic chromatin accessibility and gene expression data to reveal networks of cis regulatory elements, transcription factor binding complexes and cascades of trans-acting regulatory factors that control cell fate.
In Aim2, we will apply our modeling framework to investigate early dynamics of nuclear reprogramming of human fibroblasts to pluripotency. We will leverage a powerful heterokaryon cell fusion model system to generate global chromatin and gene expression profiles over a two day timecourse.
In Aim 3, we perform perturbation experiments using RNAi and CRISPR/Cas9 technologies to validate hypotheses generated by our models and test the effectiveness of predicted pluripotency factors and regulatory elements in inducing reprogramming. The validation experiments will be further used to iteratively refine the computational models. We will integrate the time-course data generated in our model system with data from large reference compendia of functional genomic data such as the Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Project. Our analyses will reveal molecular mechanisms crucial to early and transient stages of nuclear reprogramming, providing novel contributions to our fundamental knowledge of regenerative medicine. Finally, the proposed end-to-end integrative framework is highly generalizable and will be of broad utility to learn dynamic models of transcriptional regulation from time-course datasets in other model systems.

Public Health Relevance

Understanding how cells precisely control the timing and order of gene expression changes during dynamic cellular processes remains a challenge. Here, we propose the development of novel machine learning methods based on deep neural networks to decipher dynamic regulatory networks controlling cell fate decisions. We will generate genome-wide temporal profiles of chromatin state and gene expression during nuclear reprogramming, learn integrative models of dynamic gene regulation and perform experiments to validate biological hypotheses generated by the model.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code