Prediction of RNA secondary structure is important in many areas of molecular biology. For example, in phylogenetic studies, prediction of the secondary structure of """"""""16S-like"""""""" rRNA is an important piece of information used in determining the correct alignment of sequences. The """"""""well-defined"""""""" regions of the multiply aligned sequences are then used to make phylogenetic inferences. Secondary structures can be used in part to explain translational controls in mRNA, and replication controls in single-stranded RNA viruses. RNA structure also plays an important role in the regulation of retroviruses and cellular messenger RNAs. Secondary structure modeling can be used as a first step to the more intricate process of three-dimensional modeling. This could include the modeling of ribosomal RNA or catalytic RNA's such as group I introns. The folding patterns of individual RNA molecules are most often predicted using energy minimization approaches based on dynamic programming algo rithms. The optimization of the most modern versions of these algorithms on highly parallel computers will allow the application of these methods to biological systems previously too large for these calculations. Although the RNA folding problem is ill-conditioned, this drawback can be mitigated by the use of special """"""""energy dot plots"""""""" that show the superposition of all possible foldings in the vicinity of a global energy minimum. Multiple predicted foldings indicate either the inability of the present modeling procedure to yield an unambiguous answer, or the actual presence of multiple structures (or both). We are now focusing on the MFOLD program, a more recent version of the program that incorporates these advances. Several cycles of optimization have been performed on MFOLD. First, the existing SGI version was simply ported to the Paragon. As expected this version was not particularly efficient since it required large global data arrays and frequent communication. In a second step, it was determined that the computationally intensive part of the code lies in the calculation of multiply branched loops. After optimizing this part of the computation the code performed relatively well, but was limited to relatively small molecules because of its replication of the stored energy values on each processor. Finally, a distributed memory version has been implemented that allows molecules of at least 20,000 bases to be analyzed. This appears to be both the fastest implementation, as well as the most memory efficient, currently available. In the current year, we proceeded with the porting of the parallel code from the paragon to the CRAY T3E. This meant a translation of the parallel communication calls from the nx library to the MPI library. Moreover, this port involved many declaration changes, and other formatting modifications needed to compile the code on the T3E. At this point in time, we have completed nearly 95% of the port. There remains a small error in the traceback routine. In the near future, we will complete the testing of the code, and add in a couple of communications optimizations that should improve performance. In the next year we will make the MFOLD code publicly available through the NBCR as an additional transparent supercomputing service.
Showing the most recent 10 out of 270 publications