Computational methods will play a key role in extracting medically and scientifically useful information from the complete genome sequences of humans and other model organisms. Recognition of similarities to known biological sequence families has recently been significantly enhanced by the introduction of full probabilistic consensus models. The proposed research aims to further develop probabilistic models of protein and RNA consensus structure in order to improve recognition of distantly related protein and RNA homologues. These methods will be applied to large scale genome analysis, protein fold recognition, and RNA secondary structure prediction. Hidden Markov modeling methods will be extended to include structural information in addition to consensus sequence information from a protein family, in order to increase the sensitivity of protein fold recognition. A library of hidden Markov models of several hundred known protein structure families will be made and incorporated into the publicly available SCOP protein structure database on the World Wide Web. RNA covariance models describe RNA secondary structure in addition to sequence consensus, but their use is limited to small RNAs. Algorithmic improvements will be developed which greatly extend the useful range of covariance models. Sensitive secondary structure based recognition and alignment of most RNAs will be made feasible, as will consensus secondary structure prediction. Models will be developed for identifying homologues of a number of RNA gene families. These methods will be applied to the analysis of protein and RNA genes in the genome sequences of Caenorhabditis elegans, Saccharomyces cerevisiae, and other organisms obtain by genome sequencing projects in St. Louis and at the Sanger Center in Cambridge UK.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG001363-02
Application #
2378640
Study Section
Genome Study Section (GNM)
Project Start
1996-03-01
Project End
1999-02-28
Budget Start
1997-03-01
Budget End
1998-02-28
Support Year
2
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
062761671
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Jung, Seolkyoung; Swart, Estienne C; Minx, Patrick J et al. (2011) Exploiting Oxytricha trifallax nanochromosomes to screen for non-coding RNA genes. Nucleic Acids Res 39:7529-47
Johnson, L Steven; Eddy, Sean R; Portugaly, Elon (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11:431
Nawrocki, Eric P; Kolbe, Diana L; Eddy, Sean R (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335-7
Nawrocki, Eric P; Eddy, Sean R (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 3:e56
Dowell, Robin D; Eddy, Sean R (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400
Darnell, Jennifer C; Fraser, Claire E; Mostovetsky, Olga et al. (2005) Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev 19:903-18
Dowell, Robin D; Eddy, Sean R (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71
Klein, Robert J; Eddy, Sean R (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44
McCutcheon, John P; Eddy, Sean R (2003) Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. Nucleic Acids Res 31:4119-28
Zmasek, Christian M; Eddy, Sean R (2002) RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3:14

Showing the most recent 10 out of 22 publications