Understanding how genes are turned on and off, and how their precise levels of expression are regulated, is critical to describing the connection between genetic variations and human health. On-going community-wide efforts promise to catalog vast amounts of information (data) about genomic states in a variety of conditions, including specific disease states. Such catalogs are expected to help identify key regulators of condition-specific gene expression. However, the ultimate dream of `reading' the DNA sequence and accurately predicting expression levels in any given cell is likely to remain elusive. We propose to develop advanced computational tools that will help biologists and genome scientists realize this final goal of predicting gene expression levels from sequence. The first and main goal of this proposal is to build a software system that will help a biologist model how gene expression relates to regulatory sequences. Here, `model' refers to describing the relationship between sequence and expression in a quantitative language, with a very high level of accuracy. The proposed software system, to be called `GEM' (Gene Expression Modeling), will consolidate our efforts in this direction for the last five years, and also incorpoate novel biochemical aspects to the model. In a departure from the norm in this field, the proposed software will present to the biologist all models consistent with the collected data, and not just the single most agreeable model. In other words, the scientist will get to see all possible interpretations of their data in terms of gene regulatory interactions in the cell.
The second aim i s devoted to presenting the model to the scientist in easily interpretable formats, including a variety of visual representations. The goal here is to connect the typically quantitative and abstract form of the above-mentioned models to the more tangible notions the biologist has about gene regulation mechanisms.
The third aim of this proposal is to help the biologist improve the models created in Aim 1, either by hypothesizing the existence of hitherto unknown regulators of the gene, or by generating additional data. The software system will use rigorous statistical methods and objective criteria to help the biologist decide which experiments should be most productive in advancing their understanding of the gene regulatory system.
All specific aims will be evaluated on four important regulatory systems from insects and mammals.

Public Health Relevance

Understanding how genes are turned on and off, and how their precise levels of expression are regulated, is critical to describing the connection between genetic variations and human health. We propose to build a software system to help a biologist build quantitative descriptions of the molecular interactions that control gene expression. Such descriptions will help us realize the goal of predicting the impact of DNA mutations on gene expression levels, and consequently on an individual's predisposition or response to disease conditions.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM114341-03
Application #
9198038
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2015-03-15
Project End
2018-12-31
Budget Start
2017-01-01
Budget End
2017-12-31
Support Year
3
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
041544081
City
Champaign
State
IL
Country
United States
Zip Code
61820
Hanson, Casey; Cairns, Junmei; Wang, Liewei et al. (2018) Principled multi-omic analysis reveals gene regulatory mechanisms of phenotype variation. Genome Res 28:1207-1216
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave et al. (2018) A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Sci Rep 8:6620
Yang, Wei; Sinha, Saurabh (2017) A novel method for predicting activity of cis-regulatory modules, based on a diverse training set. Bioinformatics 33:1-7
Samee, Md Abul Hassan; Lydiard-Martin, Tara; Biette, Kelly M et al. (2017) Quantitative Measurement and Thermodynamic Modeling of Fused Enhancers Support a Two-Tiered Mechanism for Interpreting Regulatory DNA. Cell Rep 21:236-245
Emad, Amin; Cairns, Junmei; Kalari, Krishna R et al. (2017) Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance. Genome Biol 18:153
Khoueiry, Pierre; Girardot, Charles; Ciglar, Lucia et al. (2017) Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity. Elife 6:
Peng, Pei-Chen; Sinha, Saurabh (2016) Quantitative modeling of gene expression using DNA shape features of binding sites. Nucleic Acids Res 44:e120
Samee, Md Abul Hassan; Lim, Bomyi; Samper, NĂºria et al. (2015) A Systematic Ensemble Approach to Thermodynamic Modeling of Gene Expression from Sequence Data. Cell Syst 1:396-407