? To date, post-genomic analysis has focused on identification and annotation of protein-coding genes and their products. In comparison, the identification of DNA sequences encoding functional RNA has been largely neglected. Functional RNAs can encode trans-acting molecules that interact with other RNAs or proteins, such as tRNAs, rRNAs, RNaseP RNA, small nucleolar RNAs and microRNAs; they can also function in cis as untranslated regions that regulate post-transcriptional expression of mRNAs. Recent experimental and computational studies suggest that there may be hundreds of unknown functional RNAs in prokaryotes and thousands in eukaryotes. ? ? The primary goal of this proposal is to develop a computational approach to the identification of novel RNA genes and functional RNA elements in complete DNA genomes. Machine learning techniques will be used to recognize hallmarks of functional RNA coding sequences by comparison with sequences that do not encode RNAs. Several types of signals are useful in discriminating functional RNA including: 1) differences in global sequence composition, 2) calculated RNA secondary structure features, i.e. free energy of folding and 3) specific sequence elements common to RNA structure. These and other parameters can be rapidly tried and tested using machine learning methods. This method will be optimized for individual genomes with respect to input databases, parameterization and machine learning method and architecture. Results will be evaluated computationally by cross-validation testing, comparative genomics, and calculated secondary structure and free energy of folding. Experimental studies of RNA expression and function will be conducted in conjunction with collaborators. ? ? Preliminary studies have demonstrated the power of this approach to predict novel functional RNAs in E. coli, other bacteria and archaea as supported by computational cross-validation and experimental confirmation. We will test and apply this approach to discover new functional RNAs in eukaryotic genomes including S. cervisiae, C. elegans, and humans. RNA prediction for these larger and more complex genomes will require the optimization of computational parameters, as well as the development of appropriate input datasets and training algorithms. ? ? The prediction of novel functional RNAs in the human genome presents an opportunity to understand new regulatory and developmental processes. Known RNAs implicated in human disease, such as telomerase RNA (cancer, aging), XIST RNA (X-chromosome inactivation) and BIC (a proto-oncogene) underscore the importance of developing a method to identify the full complement of human RNA genes. ? ?

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG002665-01
Application #
6558575
Study Section
Special Emphasis Panel (ZRG1-GNM (05))
Program Officer
Good, Peter J
Project Start
2003-06-01
Project End
2006-04-30
Budget Start
2003-06-01
Budget End
2004-04-30
Support Year
1
Fiscal Year
2003
Total Cost
$377,162
Indirect Cost
Name
Lawrence Berkeley National Laboratory
Department
Other Basic Sciences
Type
Organized Research Units
DUNS #
078576738
City
Berkeley
State
CA
Country
United States
Zip Code
94720
Stefan, Liliana R; Zhang, Rui; Levitan, Aaron G et al. (2006) MeRNA: a database of metal ion binding sites in RNA structures. Nucleic Acids Res 34:D131-4
Hendrix, Donna K; Brenner, Steven E; Holbrook, Stephen R (2005) RNA structural motifs: building blocks of a modular biomolecule. Q Rev Biophys 38:221-43
Holbrook, Stephen R (2005) RNA structure: the long and the short of it. Curr Opin Struct Biol 15:302-8