One of the greatest challenges in animal biology is to learn how genomic sequence information is read by transcription factors to produce patterns of gene expression within the context of regulatory networks in developing embryos. Project 4 is part of a broader Program Project that will integrate computational modeling and wet laboratory methods to address this challenge in the belief that only quantitative, predictive mathematical models that have been validated experimentally can provide the rigorous understanding required for modeling transcriptional networks of animals. Project 4's contribution to the overall program will be to rigorously and systematically evaluate different hypotheses about how combinations of transcription factors, bound to c/s-regulatory modules (CRMs), generate complex spatial and temporal patterns of expression. We will do this by developing a series of models that accurately predict, with single nucleus resolution, patterns of transcription given protein concentration data and information of DNA binding in vivo taken from ChIP experiments or from models produced by Project 3. We will explore the dependence of CRM transcriptional output on sequence-level architecture (i.e. the configuration and orientation of individual recognition sites);on changes in protein concentration from nucleus to nucleus or time-point to time-point;and on the structure and state of nucleosomes. Project 4 will greatly extend two initial models that we have developed. The first uses Ordinary Differential Equations (ODEs) that take transcription factor protein and target gene mRNA expression data to predict in which cells of the embryo each factor either activates or represses a given gene as well as the degree of that regulation. The second uses a generalized linear model (GLM) that fits in vivo DNA binding information to the transcription driven by CRMs in transgenic embryos to learn how transcription factors interact within CRMs to drive complex spatio temporal transcription patterns.
Aim 1 of this Project will extend our existing ODE model by also incorporating ChIP data on the average occupancy of each transcription factor across each CRM. This will provide probabilities for which factors regulate which genomic regions.
Aim 2 will develop generalized linear mixed models (GLMMs) to aggregate a range of postulated causal interactions, e.g. homomeric cooperativity, local repression, architecture specific effects, while using the output of Aim 1 to restrict the space of parameters that we need to explore. Our models will be validated in collaboration with Project 2 and the Expression and Database Core using transgenic constructs to determine the effect on transcription of modifying the affinity and locations of transcription factor recognition sites within bona fide CRMs and also to discover which genomic regions bound by factors in vivo are functional CRMs and which represent low level non functional interactions. By helping to establishing how to read transcriptional information in animal genomes, this Project will aid both the development of therapeutics for human genetic diseases and the understanding of animal development.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Program Projects (P01)
Project #
1P01GM099655-01
Application #
8262275
Study Section
Special Emphasis Panel (ZRG1-GGG-H (40))
Project Start
Project End
Budget Start
2012-09-13
Budget End
2013-06-30
Support Year
1
Fiscal Year
2012
Total Cost
$229,853
Indirect Cost
$76,741
Name
Lawrence Berkeley National Laboratory
Department
Type
DUNS #
078576738
City
Berkeley
State
CA
Country
United States
Zip Code
94720
Li, Jingyi Jessica; Bickel, Peter J; Biggin, Mark D (2014) System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ 2:e270
Knowles, David W; Biggin, Mark D (2013) Building quantitative, three-dimensional atlases of gene expression and morphology at cellular resolution. Wiley Interdiscip Rev Dev Biol 2:767-79