A principal goal of the NHGRI is to develop methods for comprehensively identifying functional elements in genome sequences, in order to establish genomic parts lists as foundations for large-scale biology. The long-term objective of the research program described in this proposal is to develop new computational approaches for identifying genomic features +using probabilistic modeling methods. This proposal focuses specifically on identifying and characterizing the numerous genes that produce structural, regulatory, and catalytic RNAs. Current methodology is not yet up to the task of systematic enumeration of the RNA genes in any genome. Noncoding RNAs pose interesting challenges for computational sequence analysis, and motivate approaches substantially different from standard primary sequence alignment methods. The proposed methods use comparative sequence analysis and a class of probabilistic models called stochastic context free grammars (SCFGs), which are well suited to modeling the evolutionary conservation of both RNA secondary structure and RNA sequence.
Five specific aims are proposed. The human genome and the genomes of two major model animal systems, the worm Caenorhabditis and the fly Drosophila, will be screened computationally for new RNA genes using comparative genome sequence information and an SCFG-based structural RNA genefinding program, QRNA.
Three aims propose improvements in the speed of SCFG-based RNA structural homology searches: an a priori banded dynamic programming alignment method, extension of the BLAST algorithm to RNA structure alignment, and a constrained Sankoff algorithm for simultaneous alignment and folding of two homologous RNAs.
A final aim proposes a method for identifying the mRNA targets of regulatory RNAs (such as the newly discovered micro RNAs) by comparative genome analysis.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG001363-07
Application #
6728039
Study Section
Special Emphasis Panel (ZRG1-SSS-H (90))
Program Officer
Good, Peter J
Project Start
1996-03-01
Project End
2008-11-30
Budget Start
2003-12-24
Budget End
2004-11-30
Support Year
7
Fiscal Year
2004
Total Cost
$374,607
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Jung, Seolkyoung; Swart, Estienne C; Minx, Patrick J et al. (2011) Exploiting Oxytricha trifallax nanochromosomes to screen for non-coding RNA genes. Nucleic Acids Res 39:7529-47
Johnson, L Steven; Eddy, Sean R; Portugaly, Elon (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11:431
Nawrocki, Eric P; Kolbe, Diana L; Eddy, Sean R (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335-7
Nawrocki, Eric P; Eddy, Sean R (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 3:e56
Dowell, Robin D; Eddy, Sean R (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400
Darnell, Jennifer C; Fraser, Claire E; Mostovetsky, Olga et al. (2005) Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev 19:903-18
Dowell, Robin D; Eddy, Sean R (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71
Klein, Robert J; Eddy, Sean R (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44
McCutcheon, John P; Eddy, Sean R (2003) Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. Nucleic Acids Res 31:4119-28
Zmasek, Christian M; Eddy, Sean R (2002) RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3:14

Showing the most recent 10 out of 22 publications