Probabilistic Models of Protein and RNA Structure

Eddy, Sean

Abstract

Interpretation of genome sequence data relies heavily upon computational analysis. The goals of the Human Genome Project include the development of improved computer algorithms for more accurate identification of genes and for more sensitive recognition of homologies. Both kinds of algorithms have been improved by the use of probabilistic modeling methods. The first half of this proposal uses probabilistic modeling methods to identify noncoding RNA genes in genome sequences. Most research into genefinding algorithms has understandably focused on protein coding genes. However, an unknown number of genes make functional noncoding RNAs instead of coding for proteins. Three different computational approaches are proposed. First, a computational screen for about 10-18 new pseudouridylation guide small nucleolar RNA genes in Saccharomyces cerevisiae is proposed, based on structural and sequence homology. Second, a probabilistic model of RNA secondary structure will be developed for use as an RNA genefinder program, identifying novel structural RNAs by significant secondary structure content. Third, comparative sequence analysis of the Caeonorhabditis elegans and Caenorhabditis briggsae genomes will be used to identify conserved sequences that do not correspond to coding regions. The second part of the proposal focuses on using profile hidden Markov models to improve functional annotation of predicted protein coding genes. HMMER, a profile HMM software package, will continue to be supported and developed, and will continue to support the PFAM database of more than 800 common protein domains. Additionally, a """"""""simulated evolution"""""""" algorithm is proposed for increasing HMMER's sensitivity.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG001363-05
Application #: 6181623
Study Section: Genome Study Section (GNM)
Program Officer: Brooks, Lisa

Project Start: 1996-03-01
Project End: 2002-03-31
Budget Start: 2000-04-01
Budget End: 2001-03-31
Support Year: 5
Fiscal Year: 2000
Total Cost: $343,227
Indirect Cost

Institution

Name: Washington University
Department: Genetics
Type: Schools of Medicine
DUNS #: 062761671

City: Saint Louis
State: MO
Country: United States
Zip Code: 63130

Related projects


NIH 2006 R01 HG	Probabilistic models of protein and RNA structure Eddy, Sean R. / Washington University	$128,036
NIH 2005 R01 HG	Probabilistic models of protein and RNA structure Eddy, Sean R. / Washington University	$283,500
NIH 2004 R01 HG	Probabilistic models of protein and RNA structure Eddy, Sean R. / Washington University	$374,607
NIH 2003 R01 HG	Probabilistic Models of Protein and RNA Structure Eddy, Sean R. / Washington University	$566
NIH 2000 R01 HG	Probabilistic Models of Protein and RNA Structure Eddy, Sean R. / Washington University	$343,227
NIH 1999 R01 HG	Probabilistic Models of Protein and RNA Structure Eddy, Sean R. / Washington University
NIH 1998 R01 HG	Probabilistic Models of Protein and RNA Structure Eddy, Sean R. / Washington University
NIH 1997 R01 HG	Probabilistic Models of Protein and RNA Structure Eddy, Sean R. / Washington University
NIH 1996 R01 HG	Probabilistic Models of Protein and RNA Structure Eddy, Sean R. / Washington University

Publications

Jung, Seolkyoung; Swart, Estienne C; Minx, Patrick J et al. (2011) Exploiting Oxytricha trifallax nanochromosomes to screen for non-coding RNA genes. Nucleic Acids Res 39:7529-47

Johnson, L Steven; Eddy, Sean R; Portugaly, Elon (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11:431

Nawrocki, Eric P; Kolbe, Diana L; Eddy, Sean R (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335-7

Nawrocki, Eric P; Eddy, Sean R (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 3:e56

Dowell, Robin D; Eddy, Sean R (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400

Darnell, Jennifer C; Fraser, Claire E; Mostovetsky, Olga et al. (2005) Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev 19:903-18

Dowell, Robin D; Eddy, Sean R (2004) Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 5:71

Klein, Robert J; Eddy, Sean R (2003) RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44

McCutcheon, John P; Eddy, Sean R (2003) Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. Nucleic Acids Res 31:4119-28

Zmasek, Christian M; Eddy, Sean R (2002) RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3:14

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: