The objective of the project is a rigorous mathematical analysis of stochastic context-free grammars for RNA secondary structure modeling. An accent is put on the analysis of the dependency of the prediction on the type of production rules and the probability parameters used, with the goal of better understanding the limitations of the choices made during the grammar design. The results should suggest ways of modifying the grammars that would lead to improvement in the prediction.
The scientific view of RNA molecules as passive transmitters of the genetic code has drastically changed during the last few decades and the known biological functions of RNA continue to grow in number and expand in scope. To perform the necessary function, the RNA nucleotide chain must fold into a specific three-dimensional functional shape. Thus, understanding the ways RNA performs its function is tightly related to knowing its structure. Since experimental determination of the structure is expensive and time consuming, the problem of predicting the structure of the RNA molecule is an important problem in computational biology. This project will investigate computational methods for RNA structure prediction based on stochastic context-free grammars and the effects of changing different components of the model on the accuracy of the prediction.