Our research goal emphasizes developing novel, sophisticated algorithms that integrate statistical and computational tools for RNA folding, pattern search, sequence and structure comparison. These tools find non-coding (ncRNAs) and functional RNA elements, and enable analyses of complete genome sequences. Our scientific accomplishments during the past year are summarized as follows:A. Development of computer algorithms for analyses of RNA structures and the discovery of ncRNAs, RNA motifs and RNA functional elements.Functional RNAs have characteristic RNA structural motifs represented by specific combinations of base pairings and conserved nucleotides in loop regions. Discovery of distinct well-ordered structures and their homologues in genome-wide searches will enhance our ability to discover RNA structural motifs and help us to highlight their association with functional ncRNAs and regulatory RNA elements. In collaboration with Prof. Zhang's Lab. (Dept. of Computer Science, Univ. of Western Ontario, London, Ontario, Canada), we developed a novel computer algorithm , HomoStRscan, that takes a single RNA sequence along with its secondary structure to search for homologous RNAs in complete genomes. This novel algorithm differs completely from other currently used search algorithms for homologous structures or structural motifs in two important aspects: first, it takes detailed account of information of both the primary sequence and the secondary structural constraints of the query RNA including each base-pair in duplexes and each nucleotide in the single strands; second, the homologous RNA structures are strictly inferred from a robust statistical distribution of a quantitative measure, maximal similarity score. The method provides a flexible, robust and fine search tool for any homologous structural RNAs. To test this novel program we searched for 5S rRNA and tRNA in bacterial genome databases. Our results from more than 20 bacterial genomes indicate that HomoStRscan discovers these ncRNAs with high sensitivity/specificity ratios. Our computational experiments for these complete genomic sequences indicate that HomoStRscan detects 100% of the true 5S rRNAs with no false positives. Moreover, HomoStRscan finds new 5S rRNA genes in several bacterial genomes that are not currently annotated in the database.HomoStRscan can also discover tRNAs in the genomic sequences with very high sensitivity/specificity ratios, even if those tRNAs have introns. Our test for tRNA genes in K.lactis yeast genome of 10.6 Mb correctly predicts all tRNA genes listed in published databases, and also predicts an additional 31 tRNAs, most of which have intron sequences of various sizes within the anticodon loop.In general, our method can be used to search for any RNA segments that have established secondary structure. The search for ncRNAs is being conducted on a large scale using HomoStRscan and rna_match for known ncRNAs.B. Development of computer programs StemEd and SigStem for the statistical inference of local well-ordered structures in genomic sequences.Discovery of microRNAs (miRNAs)suggests that there are a large class of small non-coding RNAs in eukaryotic genomes. These miRNAs have the potential to form distinct, fold-back, stem-loop structures. The prediction of these well-ordered, folding sequences (WFS) in genomic sequences is very helpful for our understanding of RNA-based gene regulation and the determination of local RNA elements with structure-dependent functions. Previously, we developed EDscan and SigED that have the power to discover such distinct WFS by scanning successive segments along a sequence and evaluating the difference between E_diff of the natural sequence and those computed from randomly shuffled sequences. The measure E_diff of a given RNA segment here is same as was defined in the previous EDscan method, where E_diff is the difference of free energies between the folded global, minimal energy structure in the segment and its corresponding optimal, restrained structure (ORS) where all the previous base pairings in the lowest free energy structure are forbidden. Using a standard z-score, SigZscr, we can estimate the behavior of E_diff in the real biological segment and make a robust statistical inference based on the general behavior of E_diff in the tested random sample. However, the computational complexity of EDscan and SigED is directly proportional to the cube of the scanning window length. Thus it is very compute intensive in searching the whole human genome. Since for miRNAs we are interested in relatively simple, distinct fold-back, stem-loop structures only in the search of those miRNA precursors in the genomic sequence we improved our algorithm to consider only stem-loop structures. Consequently, the new algorithms StemEd and SigStem contain all the power of EDscan and SigED, but the computational complexity was reduced to be directly proportional to the square of the window length. In addition, the predicting ability of the new method is less sensitive to the selection of the scanning window. This is especially advantageous in discovering unknown structural motifs and ncRNAs in genomic sequences.Our results and statistical test from all known miRNA precursors indicate that the statistically significant WFS detected by StemEd and SigStem in genomic sequences are coincident with known fold-back stem-loops found in the miRNA precursors. In statistical tests, we include 207 miRNA precursors of human, 208 of mouse, 187 of rat,121 of G. gallus, 78 of Drosophila melanogaster, 116 of Caenorhabditis elegans, 50 of Caenorhabditis briggsae, 92 of Arabidopsis thaliana and122 miRNA precursors of Oryza sativa.We are continuing the detailed analysis and intend to find distinct well-ordered folding patterns in other species and expect them to be potential miRNAs.C. Data mining of large dsRNA segments and RNA functional elements in sequence databases.Recent developments in the study of RNA silencing indicate that double-stranded RNA (dsRNA) can be used in eukaryotes to block expression of a corresponding cellular gene. In the RNAi pathway, dsRNAs serve as the initial triggers that are chopped by an ribonuclease termed """"""""Dicer"""""""" and may result in aberrant mRNA. We search for the stalk-like dsRNAs in the 3'UTR database. The occurrence rate of the dsRNA structures in 3'UTRs ranges from 0.01% in plant to 0.30% in vertebrate mRNAs. These stalk-like dsRNAs are predicted to be very significant in Monte Carlo simulations and are well-determined' in RNA structure predictions. The distinct dsRNA structures in the database can be used to test the hypothesis for the nature of endogenous dsRNAs in the 3'UTR and the possibility that they can induce RNAi. We collaborated on studies of induction and suppression of RNA interference by HIV-1 with Dr. Jeang's Lab. (Mol. Virol. Section, Lab of Mol. Microbio., NIAID, NIH). Our results indicate that although short interfering RNAs have been used artificially to silence viral infections, no direct evidence exists that natural viral sequences provoke such immunity in mammalian cells. Our computation discovers a series of dsRNAs of 19 base-pair sin about 500 HIV and related sequences. The conserved, naturally occurring 19 base-pair dsRNA elicits antiviral RNA interference in human cells. Interestingly, HIV has evolved a suppressor of RNA silencing embodied in its Tat protein to combat this induced RNA interference. And Tat suppresses RNA silencing through a functional abrogation of Dicer activity. Our results suggest it is the pre-processed, short, interfering siRNA, but not processing-requiring long interfering liRNA nor short hairpin shRNA, that should be the preferred consideration for inhibiting HIV-1 infections.

Agency
National Institute of Health (NIH)
Institute
Division of Basic Sciences - NCI (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC008380-20
Application #
7048218
Study Section
(LECB)
Project Start
Project End
Budget Start
Budget End
Support Year
20
Fiscal Year
2004
Total Cost
Indirect Cost
Name
Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Le, Shu-Yun; Chen, Jih-H; Konings, Danielle et al. (2003) Discovering well-ordered folding patterns in nucleotide sequences. Bioinformatics 19:354-61