Understanding how genes are turned on and off, and how their precise levels of expression are regulated, is critical to describing the connection between genetic variations and human health. On-going community-wide efforts promise to catalog vast amounts of information (data) about genomic states in a variety of conditions, including specific disease states. Such catalogs are expected to help identify key regulators of condition-specific gene expression. However, the ultimate dream of `reading' the DNA sequence and accurately predicting expression levels in any given cell is likely to remain elusive. We propose to develop advanced computational tools that will help biologists and genome scientists realize this final goal of predicting gene expression levels from sequence. The first and main goal of this proposal is to build a software system that will help a biologist model how gene expression relates to regulatory sequences. Here, `model' refers to describing the relationship between sequence and expression in a quantitative language, with a very high level of accuracy. The proposed software system, to be called `GEM' (Gene Expression Modeling), will consolidate our efforts in this direction for the last five years, and also incorpoate novel biochemical aspects to the model. In a departure from the norm in this field, the proposed software will present to the biologist all models consistent with the collected data, and not just the single most agreeable model. In other words, the scientist will get to see all possible interpretations of their data in terms of gene regulatory interactions in the cell.
The second aim i s devoted to presenting the model to the scientist in easily interpretable formats, including a variety of visual representations. The goal here is to connect the typically quantitative and abstract form of the above-mentioned models to the more tangible notions the biologist has about gene regulation mechanisms.
The third aim of this proposal is to help the biologist improve the models created in Aim 1, either by hypothesizing the existence of hitherto unknown regulators of the gene, or by generating additional data. The software system will use rigorous statistical methods and objective criteria to help the biologist decide which experiments should be most productive in advancing their understanding of the gene regulatory system.
All specific aims will be evaluated on four important regulatory systems from insects and mammals.
Understanding how genes are turned on and off, and how their precise levels of expression are regulated, is critical to describing the connection between genetic variations and human health. We propose to build a software system to help a biologist build quantitative descriptions of the molecular interactions that control gene expression. Such descriptions will help us realize the goal of predicting the impact of DNA mutations on gene expression levels, and consequently on an individual's predisposition or response to disease conditions.