Predicting Gene Structure: Vertebrate Genome Comparison

Brent, Michael

Abstract

De novo gene prediction is the automated identification of gene structures using genome sequences as the only inputs. We propose to continue a project that has significantly improved the accuracy of de novo gene prediction in vertebrates. When we started, GENSCAN predicted a correct exon-intron structure throughout one open reading frame (ORF) at only 10% of human gene loci. We have now published systems that predict a correct ORF at 35% of human loci. RT-PCR and sequencing of our predictions have verified hundreds of new human genes. With this renewal we aim to continue driving improvements in the accuracy of vertebrate gene prediction and its utility for biomedical applications.
Aim 1 Improve the accuracy of gene structure prediction in vertebrates - A. Develop improved models of informative patterns in multi-genome alignments Comparing the sequences of multiple vertebrate genomes should allow us to estimate the degree and pattern of selection at each site, lead ingto more accurate gene predictions. We propose a robust approach based on learning the patterns that exist in real alignment columns, even if they are due in part to sequencing, alignment, and assembly errors. The proposed model is a generalization of our successful TWINSCAN gene predictor. In preliminary studies its accuracy surpassed that of any previous gene prediction system for human. B. Develop improved models of informative patterns in the target DNA sequence We propose to systematically model regularities in gene structure that were previously considered too rare or elusive to be worthy of attention, such as splicing enhancers and suppressors, correlations between intron length and splice site sequence, and differential patterns of repeat insertion in introns versus non-transcribed regions.
Aim 2 Develop and maintain software, web server, and genome annotations Our goal is to improve scientific understanding and human health by providing more accurate gene predictions to the biomedical research community. Therefore, we will develop high quality, open source software, parameter sets for a variety of genomes, and a web server where users can submit sequences for annotation. Finally, we will distribute and display annotation for each new assembly of every vertebrate genome. This project will result in open source software that predicts exon-intron structures in vertebrate genomes more accurately than any current system. It will also increase the sensitivity and specificity of gene verification by RT-PCR. ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG002278-06
Application #: 7391629
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Good, Peter J

Project Start: 2000-08-01
Project End: 2010-03-31
Budget Start: 2008-04-01
Budget End: 2010-03-31
Support Year: 6
Fiscal Year: 2008
Total Cost: $289,575
Indirect Cost

Institution

Name: Washington University
Department: Genetics
Type: Schools of Medicine
DUNS #: 068552207

City: Saint Louis
State: MO
Country: United States
Zip Code: 63130

Related projects


NIH 2008 R01 HG	Predicting Gene Structure: Vertebrate Genome Comparison Brent, Michael R. / Washington University	$289,575
NIH 2007 R01 HG	Predicting Gene Structure: Vertebrate Genome Comparison Brent, Michael R. / Washington University	$295,427
NIH 2006 R01 HG	Predicting Gene Structure: Vertebrate Genome Comparison Brent, Michael R. / Washington University	$305,250
NIH 2004 R01 HG	Predicting Gene Structure - Vertebrate Genome Comparison Brent, Michael R. / Washington University	$306,000
NIH 2003 R01 HG	Predicting Gene Structure - Vertebrate Genome Comparison Brent, Michael R. / Washington University	$295,400
NIH 2002 R01 HG	Predicting Gene Structure - Vertebrate Genome Comparison Brent, Michael R. / Washington University	$398,500

Publications

Brent, Michael R (2008) Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet 9:62-73

Tenney, Aaron E; Wu, Jia Qian; Langton, Laura et al. (2007) A tale of two templates: automatically resolving double traces has many applications, including efficient PCR-based elucidation of alternative splices. Genome Res 17:212-8

Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R (2007) The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 23:545-54

van Baren, Marijke J; Koebbe, Brian C; Brent, Michael R (2007) Using N-SCAN or TWINSCAN to predict gene structures in genomic DNA sequences. Curr Protoc Bioinformatics Chapter 4:Unit 4.8

Brent, Michael R (2007) How does eukaryotic gene prediction work? Nat Biotechnol 25:883-5

Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H et al. (2006) Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. Genome Biol 7 Suppl 1:S5.1-10

Flicek, Paul; Brent, Michael R (2006) Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts. Genome Biol 7 Suppl 1:S8.1-9

Gross, Samuel S; Brent, Michael R (2006) Using multiple alignments to improve gene prediction. J Comput Biol 13:379-93

van Baren, Marijke J; Brent, Michael R (2006) Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 16:678-85

Wei, Chaochun; Brent, Michael R (2006) Using ESTs to improve the accuracy of de novo gene prediction. BMC Bioinformatics 7:327

Showing the most recent 10 out of 21 publications

Comments

Be the first to comment on Michael Brent's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: