RNA molecules are an important component of the cellular machinery. They are now known to be essential for numerous biological processes, including protein synthesis, transcription regulation, chromosome replication, viral infection, and RNA interference. However, our knowledge of RNA molecules is still limited. This research project fills important gaps in current RNA studies by introducing novel molecular models and efficient computational tools. Specifically, the research team aims to solve the following problems under a coherent theme of studying pseudoknotted RNA structure and understanding their properties: (1) Estimation of entropy of key secondary elements of RNA molecules; (2) Identification of stable pseudoknot motifs from RNA sequences and developing libraries of pseudoknot motifs for RNA families; (3) Prediction of three dimensional ensemble of pseudoknotted RNA molecules and characterize their folding mechanism. All these problems involve exploration of probability distributions on very large state spaces where novel mathematical and statistical tools must be developed. Specifically, the research team studies and develops several techniques including efficient constrained Sequential Monte Carlo (SMC) methods, efficient Markov Chain Monte Carlo (MCMC) methods and mixing rate acceleration schemes and their combinations. The methodological development provides a solid foundation for solving the underlying biological problems. In return, those problems serve as the testing ground and inspiration of new statistical ideas and procedures. The cross-fertilization is ideal for significant advances in both biological and statistical sciences. It provides a perfect environment of education and training of the next generation of scientists and researchers in the interdisciplinary field of mathematics/ statistics and biology. Integrated education and research activities at post-doc, graduate and undergraduate levels are conducted. A set of free software are produced for implementing the developed algorithms.

This project intends to improve our understanding of RNA, an important class of biomolecules and an important component of the cellular machinery. They are now known to be essential for numerous biological processes. A deeper understanding of RNA, its dynamics and functionality, will increase our ability to develop new medicines and diagnostic procedure and propel further technological advancement, hence beneficial to the human society. Innovative statistical tools are developed to solve the underlying problems. Such tools can also be used in many other applications. The project is a cross-fertilization between statistical science and bioinformatics, computational biology, and biophysics. It provides a perfect environment of education and training of the next generation of scientists and researchers in the interdisciplinary field of mathematics/statistics and biology. Integrated education and research activities at post-doc, graduate and undergraduate levels are conducted and special attentions are paid to attract women and minority students into the wonderful research career in the field of math-biology. A set of public and free software are developed for implementing the developed algorithms. It is able to empower biologists and bioinformatics researchers with new algorithms and software in their own research and discovery.

Project Report

RNA is an important class of biomolecules. Despite recent rapid progresses, predicting their structures and assessing their stabilities are challenging tasks. Two difficulties are the calculation of conformational entropy of RNA loops and the prediction of secondary structures of pseudoknotted RNA molecules. We have made significant progress and have developed an efficient algorithm to compute entropies of key RNA secondary structural loop motifs, including hairpin, bulge, internal loop, and multibranch loop of long length up to 50. Our results correct errors in previous Jacobson-Stockmayer models for complex RNA secondary structures such as internal loops and multibranch loops. We have also developed a new method to predict the secondary and tertiary structures of pseudoknotted RNA molecules. Using an approximately optimal algorithm to assemble all possible low energy RAN stem regions, while maitaining spatial excluded volume and strictly enforcing self-consistency effects, our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless how complex the RNA structure is. Our method can predict pseudoknotted RNA secondary structures with sensitivity and specificity. It can also generate nativel-like spatial arrangement of secondary structural elements of pseudo-knotted RNA molecules. In addition, we can now identify structural basis of the experimentally measured folding behavior of pseudoknotted RNA molecules. Furthermore, we can design pseudknotted RNA molecules with altered folding mechanism based on computational predictions. We have further generalized our approach of sequential importance sampling in severely constrained configurational space, and have developed methods for generating structural ensemble of folding transition state ensemble of biomolecules, as well as in barrier crossing in high-dimensional space of reaction networks. We expect our results will be widely applicable to practical applications in studying pseudoknotted RNA molecules and in designing novel RNA molecules.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0800183
Program Officer
Mary Ann Horn
Project Start
Project End
Budget Start
2008-07-15
Budget End
2013-06-30
Support Year
Fiscal Year
2008
Total Cost
$689,553
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901