A central theme in post-genomic research is the dissection and understanding of the complex dynamical interaction involved in gene regulation. Recent advances in technology have enabled biologists to investigate gene expression and regulation in single cells on a whole-genome scale, which reveals that the amount of mRNA and protein expressed from a gene is a stochastic process. While such experimental studies have provided valuable insights on gene regulation, a computational approach based on stochastic methods is essential to understanding dynamic gene interaction, since intuitive interpretations from experimental results cannot fully expose the dynamics of a gene network. However, the current stochastic approach does not provide sophisticated modeling techniques, analytical methods and efficient simulation algorithms.

This research employs a module-based approach to modeling large gene networks, similar to that used in analyzing electronic circuits, and aims to develop computational tools for analyzing and simulating gene networks. The investigator pursues the following three research thrusts: 1) derive a general stochastic model for gene expression and statistical methods for estimating model parameters based on experimental results, 2) characterize some common modules in gene networks and develop a module-based approach to analyzing gene networks, and 3) devise efficient stochastic simulation methods for large gene networks. The computational approach employed in this project will potentially have major impact on biological research, since it will shed light on many unanswered questions related to the stochasticity in gene regulation. The computational tools developed in this project will find numerous applications in biological research related to the design of drugs and synthetic gene circuits, as well as the investigation of diseases, such as cancer. The methods devised in this research can also be employed to simulate communication networks and design or optimize network protocols.

Project Report

Genes in living cells interact with each other and form comprehensive networks. Understanding the structure and dynamics of gene networks is crucial to the understanding of gene functionality, various cellular processes, and development of certain diseases such as cancer. The major goals of this project are to develop novel stochastic models, analytical approaches and efficient stochastic simulation methods for analyzing gene networks, and integrate pertinent research with education. These goals have been achieved through the following outcomes. Several efficient stochastic simulation algorithms including a K-skip algorithm, an unbiased tau-leap method, and a weighted leap method. These algorithms enable simulations of gene networks or more general chemical reaction systems; their applications in simulation of gene networks of circadian rhythms in Drosophila and of three important human genes p53, MDM2, and MDMX revealed the robustness of circadian rhythms and stochastic oscillation of p53 in the presence of DNA damage. Several algorithms for identifying gene-gene interactions including empirical Bayesian Lasso algorithms and the empirical Bayesian elastic net algorithm. These efficient algorithms facilitate analysis of tens of thousands pairs of genes simultaneously in a single model to identify gene-gene interactions involved in certain phenotypes or disease statuses. Application of these algorithms in quantitative trait locus (QTL) mapping for rice yield resulted in a list of genes and their interactions that influence rice yield. Application of these algorithms in genome-wide association study (GWAS) can detect gene-gene interactions involved in certain genetic diseases. A biophysical model and associated model inference method for identifying gene-gene interactions in regulating splicing of precursor message RNA help understand the diversity of life and identify dysregulated genes in certain diseases such as cancer. An efficient algorithm for inferring the structure of gene networks by integrating gene expression data and genetic perturbations not only improves inference accuracy, but also enables learning of causal regulatory relations among genes. Enhanced content of the machine learning course incorporating the research results of this project. The intellectual merit of these outcomes has broad impact in biology, engineering, and other fields such as statistical analysis of big data. Stochastic simulation algorithms will facilitate understanding of dynamics of gene networks; they can also be applied to simulate the traffics in telecommunication networks. Empirical Bayesian Lasso and elastic net algorithms can be employed in QTL mapping and GWASs to identify genes and their interactions involved in certain phenotypes and diseases. Particularly, the list of genes resulted from QTL mapping for rice yield can be exploited in crop breeding to improve the yield of rice. These algorithms can also be applied to engineering problems such as compressed sensing in signal processing, and to analyze various big data modeled with a high-dimensional sparse model. Computer software of several algorithms is freely available either at the website of the journal that published the algorithm or at the principal investigator’s homepage. The project provided an array of training to four graduate students involved in the project, and several research results to enhance the machine learning course at the University of Miami. The outreach effort provided a high-school student with training in computer programming, bioinformatics, and data analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0746882
Program Officer
John Cozzens
Project Start
Project End
Budget Start
2008-01-01
Budget End
2013-12-31
Support Year
Fiscal Year
2007
Total Cost
$400,000
Indirect Cost
Name
University of Miami
Department
Type
DUNS #
City
Coral Gables
State
FL
Country
United States
Zip Code
33146