Predicting Gene Structure - Vertebrate Genome Comparison

Brent, Michael

Abstract

A steadily increasing proportion of biomedical research is conducted on organisms for which genomic sequence is available. For many research questions, however, a genome is important primarily because of its protein products. Thus, a critical question in genome analysis is: What are the structures of all the genes and the exact amino acid sequences of their translation products? Despite significant contributions from experimental biology, high-throughput sequencing, and bioinformatics, we are still far from being able to answer this question accurately. The proposed research aims to improve gene-structure prediction by exploiting patterns of evolutionary conservation.
Aim 1 focuses on developing probability models for exploiting genomic homology to improve gene-structure prediction. A novel aspect of the proposed models is their use of """"""""conservation sequence"""""""" to represent the degree and pattern of evolutionary conservation at each nucleotide in the genome to be annotated. A conservation sequence is a synthesis of potentially overlapping local alignments into one sequence. Our probability models build on the Hidden Markov Model (HMM) approach used in state-of-the-art gene-structure prediction systems.
Aim 2 focuses on developing probability models for improving gene prediction by exploiting cDNA and EST alignments. The fundamental approach is similar to that of Aim 1. The most important new question is how to combine information from transcript alignments with information from genomic homology in a way that does not count the same evidence twice.
Aim 3 focuses on analysis of vertebrate genomes using homology from multiple vertebrate genomes. The best method is expected to depend on the evolutionary distances among the genomes. Our investigations will focus on gene-structure prediction in human, with homology provided by (a) both mouse and pufferfish, and (b) both mouse and rat. Our gene-structure predictions will be provided to the research community through our web site and that of our collaborators at Ensembl. The ability to predict complete gene structures reliably would constitute significant progress in high-throughput biology. Potential biomedical applications include: (1) Identifying novel protein families that could serve as drug targets, and (2) Accelerating positional cloning projects for the identification of disease related genes.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG002278-01A2
Application #: 6473279
Study Section: Genome Study Section (GNM)
Program Officer: Good, Peter J

Project Start: 2002-04-19
Project End: 2005-03-31
Budget Start: 2002-04-19
Budget End: 2003-03-31
Support Year: 1
Fiscal Year: 2002
Total Cost: $398,500
Indirect Cost

Institution

Name: Washington University
Department: Biostatistics & Other Math Sci
Type: Schools of Engineering
DUNS #: 062761671

City: Saint Louis
State: MO
Country: United States
Zip Code: 63130

Related projects


NIH 2008 R01 HG	Predicting Gene Structure: Vertebrate Genome Comparison Brent, Michael R. / Washington University	$289,575
NIH 2007 R01 HG	Predicting Gene Structure: Vertebrate Genome Comparison Brent, Michael R. / Washington University	$295,427
NIH 2006 R01 HG	Predicting Gene Structure: Vertebrate Genome Comparison Brent, Michael R. / Washington University	$305,250
NIH 2004 R01 HG	Predicting Gene Structure - Vertebrate Genome Comparison Brent, Michael R. / Washington University	$306,000
NIH 2003 R01 HG	Predicting Gene Structure - Vertebrate Genome Comparison Brent, Michael R. / Washington University	$295,400
NIH 2002 R01 HG	Predicting Gene Structure - Vertebrate Genome Comparison Brent, Michael R. / Washington University	$398,500

Publications

Brent, Michael R (2008) Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet 9:62-73

Tenney, Aaron E; Wu, Jia Qian; Langton, Laura et al. (2007) A tale of two templates: automatically resolving double traces has many applications, including efficient PCR-based elucidation of alternative splices. Genome Res 17:212-8

Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R (2007) The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 23:545-54

van Baren, Marijke J; Koebbe, Brian C; Brent, Michael R (2007) Using N-SCAN or TWINSCAN to predict gene structures in genomic DNA sequences. Curr Protoc Bioinformatics Chapter 4:Unit 4.8

Brent, Michael R (2007) How does eukaryotic gene prediction work? Nat Biotechnol 25:883-5

Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H et al. (2006) Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. Genome Biol 7 Suppl 1:S5.1-10

Flicek, Paul; Brent, Michael R (2006) Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts. Genome Biol 7 Suppl 1:S8.1-9

Gross, Samuel S; Brent, Michael R (2006) Using multiple alignments to improve gene prediction. J Comput Biol 13:379-93

van Baren, Marijke J; Brent, Michael R (2006) Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 16:678-85

Wei, Chaochun; Brent, Michael R (2006) Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics 7:327

Showing the most recent 10 out of 21 publications

Comments

Be the first to comment on Michael Brent's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: