Interpretation of genome sequence data relies heavily upon computational analysis. The goals of the Human Genome Project include the development of improved computer algorithms for more accurate identification of genes and for more sensitive recognition of homologies. Both kinds of algorithms have been improved by the use of probabilistic modeling methods. The first half of this proposal uses probabilistic modeling methods to identify noncoding RNA genes in genome sequences. Most research into genefinding algorithms has understandably focused on protein coding genes. However, an unknown number of genes make functional noncoding RNAs instead of coding for proteins. Three different computational approaches are proposed. First, a computational screen for about 10-18 new pseudouridylation guide small nucleolar RNA genes in Saccharomyces cerevisiae is proposed, based on structural and sequence homology. Second, a probabilistic model of RNA secondary structure will be developed for use as an RNA genefinder program, identifying novel structural RNAs by significant secondary structure content. Third, comparative sequence analysis of the Caeonorhabditis elegans and Caenorhabditis briggsae genomes will be used to identify conserved sequences that do not correspond to coding regions. The second part of the proposal focuses on using profile hidden Markov models to improve functional annotation of predicted protein coding genes. HMMER, a profile HMM software package, will continue to be supported and developed, and will continue to support the PFAM database of more than 800 common protein domains. Additionally, a """"""""simulated evolution"""""""" algorithm is proposed for increasing HMMER's sensitivity.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG001363-05
Application #
6181623
Study Section
Genome Study Section (GNM)
Program Officer
Brooks, Lisa
Project Start
1996-03-01
Project End
2002-03-31
Budget Start
2000-04-01
Budget End
2001-03-31
Support Year
5
Fiscal Year
2000
Total Cost
$343,227
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
062761671
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Jung, Seolkyoung; Swart, Estienne C; Minx, Patrick J et al. (2011) Exploiting Oxytricha trifallax nanochromosomes to screen for non-coding RNA genes. Nucleic Acids Res 39:7529-47
Johnson, L Steven; Eddy, Sean R; Portugaly, Elon (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11:431
Nawrocki, Eric P; Kolbe, Diana L; Eddy, Sean R (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335-7
Nawrocki, Eric P; Eddy, Sean R (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 3:e56
Dowell, Robin D; Eddy, Sean R (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400
Darnell, Jennifer C; Fraser, Claire E; Mostovetsky, Olga et al. (2005) Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev 19:903-18
Dowell, Robin D; Eddy, Sean R (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71
Klein, Robert J; Eddy, Sean R (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44
McCutcheon, John P; Eddy, Sean R (2003) Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. Nucleic Acids Res 31:4119-28
Zmasek, Christian M; Eddy, Sean R (2002) RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3:14

Showing the most recent 10 out of 22 publications