The genome defines blueprints necessary for the proper functioning of cells, and for complex organisms, such as vertebrates, it is nearly identical for most cells that make up the individual. Despite this, cells vary widely in shape and function. Understanding the mechanisms by which individual cells are set on the path to, and then maintain, their identity and function, and how they communicate with each other in order to coordinate development of the whole organism, is of keen interest to developmental biologists. The goals of this project are two-fold. First, it develops computational tools for 1) characterizing the impact of cell-cell communication on molecular function, 2) measuring biological variation in molecular function between cells collected from different tissues or individuals, and 3) predicting experimental strategies for manipulating cell identity and function. Second, it trains high school, undergraduate and graduate students in the use of these tools and data analysis techniques, and develops approaches to engage students in interdisciplinary team-based genomics research. The project will thus achieve the broader goal of training the next generation of data scientists to address important problems in biology using genomics technologies.

Recent developments in DNA sequencing technologies enable the measurement of different dynamic aspects of gene regulation across a wide spectrum of organisms. For each segment of DNA in a genome, we can now measure a snapshot of its physical accessibility, measure its relative rate of transcription into RNA, identify the location of reversible modifications to the DNA or its anchoring proteins, and even identify other distal DNA segments that are in physical contact with it. The research goal of this project is to quantitatively characterize the mechanisms by which signals from both intrinsic and extrinsic factors are integrated to drive variation in gene and chromatin regulation, and ultimately define cell identity and its dynamics. It specifically develops tools based on deep neural networks to perform in silico perturbations to cells in order to identify the regulators of transcriptional cell state, identify regulatory pathways underlying cellular responses to stimuli, and characterize the effect of cell-cell communication on gene regulation. The educational goal of this project is to develop scalable strategies to train the next generation of genome data scientists at the high school, undergraduate and graduate levels of education to use these tools to address diverse problems in biology in an interdisciplinary team-based science approach. The results of this work can be found at

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Biological Infrastructure (DBI)
Application #
Program Officer
Jean Gao
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Davis
United States
Zip Code