Prediction of RNA secondary structure is important in many areas of molecular biology. For example, in phylogenetic studies, prediction of the secondary-structure of """"""""165-like"""""""" rRNA is an important piece of information used in determining the correct alignment of sequences. The """"""""well-defined"""""""" regions of the multiply aligned sequences are then used to make phylogenetic inferences. Secondary structures can be used in part to explain translational controls in mRNA, and replication controls in single-stranded RNA viruses. RNA structure also plays an important role in the regulation of retroviruses and cellular messenger RNAs. Secondary structure modeling can be used as a first step to the more intricate process of three dimensional modeling. This could include the modeling of ribosomal RNA or catalytic RNA's such as group I introns. The folding patterns of individual RNA molecules are most often predicted using energy minimization approaches based on dynamic programming algorithms. The optimization of the most modern versions of these algorithms on highly parallel computers will allow the application of these methods to biological systems previously too large for these calculations. Although the RNA folding problem is ill-conditioned, this drawback can be mitigated by the use of special """"""""energy dot plots"""""""" that show the superposition of all possible foldings in the vicinity of a global energy minimum. Multiple predicted foldings indicate either the inability of the present modeling procedure to yield an unambiguous answer, or the actual presence of multiple structures (or both). We are now focusing on the MFOLD program, a more recent version of the program that incorporates these advances. Several cycles of optimization have been performed on MFOLD. First, the existing SGI version was simply ported to the Paragon. As expected this version was not particularly efficient since it required large global data arrays and frequent communication. In a second step, it was determined that the computationally intensive part of the code lies in the calculation of multiply branched loops. After optimizing this part of the computation the code performed relatively well, but was limited to relatively small molecules because of its replication of the stored energy values on each processor. Finally, a distributed memory version has been implemented that allows molecules of at least 20,000 bases to be analyzed. This appears to be both the fastest implementation. as well as the most memory efficient, currently available. In the next year we will make the MFOLD code publicly available through the NBCR. This will be done by means of a web based server. We will also port the code to the CRAY T3D, which should result in a substantially faster code, as well as increasing the maximum size molecule that can be analyzed. This should make it practical to analyze many unprocessed primary RNA transcripts as well as complete viral RNA molecules.
Showing the most recent 10 out of 270 publications