The basis of diseases and of their treatments resides in knowledge and comprehension of the three-dimensional structures of proteins, nucleic acids and other molecules. The availability of atomic-level structures, advances made in understanding their mechanisms, and highly efficient computational methodology that we have developed enables us to study RNA structure and function, which is no less important than that of proteins. Research on the discovery of important structural features in RNA sequences is focused on development of computational methods to find and predict the structure of such elements, and on application of these methods to specific sequences. 1. Algorithms for discovery of RNA functional elements. SigED is a new program to find well-ordered structures in RNA sequences: In a previously developed program, EDscan, we defined well ordered as being a segment of RNA whose optimally folded structure is significantly more stable than that of the structure in which all the optimal base pairs are prohibited. In SigED we refine the statistical evaluation of such predictions by extensive simulations with both the natural sequence and randomized versions of it. We further separate the analysis into measures of the duplex stem and the open loop parts. The ability to separate the loop portion may provide knowledge of use for the design of antisense agents. A protocol combining SigED with previously developed Edscan, Scanfd and rna_match was developed to facilitate searching for potential well-folded segments (WFS) in genome sized sequences: We first search for distinct WFS with high statistical significance by scanning successive segments along a genomic sequence by EDscan and SigED. From those detected WFS segments we compute their folded secondary structures by Scanfd. We then search for distinct stem-loop structure that are either highly similar to the structure morphologies of the known miRNAs or have highly conserved structural features in crossing diverse species by the program rna_match. We used the protocol on the whole genome of C. elegans (see below) and several microbial genomes. Web-based versions for the programs EDscan, SigStb and SegFold: Online users can search for unusual folding regions and/or well-ordered folding patterns (WFS) in a sequence. The programs RNAGA and EFFold can be used to predict secondary structures. They can be accessed at http://protein3d.ncifcrf.gov/shuyun/rna2d.html. RNAGA2: A new version of RNAGA has been developed in collaboration with Dr. J.-H. Chen, that employs a genetic algorithm to search for a secondary structure common to a number of phylogenetically related sequences without the need for pre-aligned RNA sequences. The new version is faster and more user friendly. 2. Data mining of RNA functional elements in RNA sequences. miRNAs: Our results from the complete Caenorhabditis elegans genome indicate that the statistically significant WFS detected by EDscan and SigED in genomic sequences are coincident with known fold-back stem-loops found in the 55 known miRNA precursors. With the combination of the two methods EDscan and SigED we can do the computational experiment in a large scale. In general, we can find potentially interesting regions in a genome sequence by the more rapid EDscan and sliding various fixed-length windows with a range of sizes. We then evaluate the random probabilities of WFS by SigED and localize the critical sequences for detailed analysis of their structure. Common structural features of cellular IRES and data mining of 5'UTR in cellular mRNAs: Statistical analyses of current databases show that approximately 16% of the vertebrate mRNA 5'UTRs are over 300 nt long. The important role in translational control for long 5'UTRs containing multiple silent upstream AUG (uAUG) triplets is now widely accepted. Experimental studies have revealed that a special sequence, termed the internal ribosome entry site (IRES) allows the translational machinery to skip over the uAUG, interact with it and start translation at the true initiator. using an integrated method composed of the above computer programs, a common Y-shaped stem-loop followed by a short complementary sequence to the 3' end of human 18S rRNA immediately upstream to the initiator has been found in known cellular IRES elements. This conserved structural feature was used to search cellular mRNAs from tumor-associated proto-oncogenes and cell factors with the designed pattern. We discovered a common structure motif in various 5'UTRs of cellular mRNAs encoding oncoproteins and cell factors related to cell proliferation. These 5'UTRs are long and have high G+C content and/or multiple uAUGs. The structural motif may play a functional role in the IRES-mediated initiation of cellular mRNAs. The PTH mRNA 3'UTR AU rich element is an unstructured functional element: Parathyroid hormone (PTH) is regulated post-transcriptionally and is dependent on the binding of protective trans acting factors to a 63 nt AU rich element (ARE) in the 3'UTR of PTH mRNA. Both the computational method (SigStb and EDscan) and experimental analysis by RNase H indicate that the 3'UTR of PTH mRNA and in particular the ARE are dominated by significant open regions with statistically significant unstable folding region. Mutation analysis in the significant open region demonstrates the importance of the conserved 26-nt sequence element of the ARE in protein-RNA binding. Our results show that the cis-acting ARE is an unstable folding region in which the highly conserved sequence pattern is involved in the interaction with trans-acting factors. (This research was completed in collaboration with Dr. Talla Naveh-Many and her laboratory in the Hebrew Univ. Medical School, Jerusalem, Israel) Mapping and characterization of the minimal IRES in the human c-myc mRNA 5'UTR: Previous studies demonstrated that the region of c-myc transcripts between nt -363 and -94 upstream from the CUG start codon contained an internal ribosome entry site (IRES). In collaboration with Dr. Veronique Kruys and her laboratory (Laboratoire de Chimie Biologique, IBMM, Univ. Libre de Bruxelles), we mapped a 50-nt segment (-143 to -94) that is sufficient to promote internal translation initiation of c-myc ORFs. Interestingly, this 50-nt element can be further dissected into two segments of 14 nt, each capable of activating internal translation initiation. Moreover, we show that this element acts as the ribosome landing site from which the 43S pre-initiation ribosomal complex scans the mRNA until it finds the CUG or AUG start codons. It is possible that the core element of c-myc IRES depends on short sequence elements rather than on long and highly structured sequences.
Le, Shu-Yun; Chen, Jih-H; Konings, Danielle et al. (2003) Discovering well-ordered folding patterns in nucleotide sequences. Bioinformatics 19:354-61 |