Computational Modeling of Genes and Gene Structure

Salzberg, Steven

Abstract

As the Human Genome Project accelerates, and as other eukaryotic sequencing projects near or reach completion, the need for automatic annotation of genes and other genomic sequence features is becoming increasingly urgent. Considerable research has been devoted to developing programs to find genes in genomic sequence data from both eukaryotes and prokaryotes, and related efforts have been dedicated to finding signals such as splice sites, ribosome binding sites, terminators, promoters and other transcription factors. Programs that find genes by alignment have become a standard tool of gene finding, taking advantage of both the increasing quantity of genomic DNA sequence and large databases of expressed sequence tags (ESTs). The rapid outpouring of complete microbial genomes--16 thus far--has added tens of thousands of genes to the public databases, making gene finding by sequence comparison more effective than ever. However, experience with the most recently completed genomes shows that 30-40 percent of the genes in a new genome do not have similarity to any previously catalogued gene. Therefore de novo gene prediction continues to be a crucially important tool, and it is important to continue research on improving the computational techniques underlying gene finding methods. This project will develop state-of-the-art gene finding systems that achieve several objectives: (1) integration of the best computational techniques for gene finding, including hidden Markov models (HMMs), interpolated Markov models (IMMs), and decision trees into a single gene finding system; (2) development of new techniques for recognizing starts of translation, splice junctions, polyadenylation sites, and other sequence patterns and integration of these in the gene finder; (3) integration of sequence homology information in the gene finding algorithm; (4) development of a platform-independent display tool that shows all annotation, including alternative interpretations of a sequence; and (5) release of this software for free use by the genomics research community. The approach to developing these bioinformatics tools will be interdisciplinary, building on the PI's experience in computer science and his NIH-supported training in genomics.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 3R01LM006845-03S1
Application #: 6497066
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 1999-09-01
Project End: 2002-12-31
Budget Start: 2001-07-01
Budget End: 2002-12-31
Support Year: 3
Fiscal Year: 2001
Total Cost: $41,267
Indirect Cost

Institution

Name: Institute for Genomic Research
Department
Type
DUNS #

City: Rockville
State: MD
Country: United States
Zip Code: 20850

Related projects

Publications

Magoc, Tanja; Wood, Derrick; Salzberg, Steven L (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9:127-36

Schatz, Michael C; Phillippy, Adam M; Sommer, Daniel D et al. (2013) Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform 14:213-24

Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67

Walenz, Brian; Florea, Liliana (2011) Sim4db and Leaff: utilities for fast batch spliced alignment and sequence indexing. Bioinformatics 27:1869-70

Lipman, David; Flicek, Paul; Salzberg, Steven et al. (2011) Closure of the NCBI SRA and implications for the long-term future of genomics data storage. Genome Biol 12:402

Mago?, Tanja; Salzberg, Steven L (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957-63

Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N et al. (2011) The genome of woodland strawberry (Fragaria vesca). Nat Genet 43:109-16

Florea, Liliana; Souvorov, Alexander; Kalbfleisch, Theodore S et al. (2011) Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies. PLoS One 6:e21400

Angiuoli, Samuel V; Salzberg, Steven L (2011) Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27:334-42

Brady, Arthur; Salzberg, Steven (2011) PhymmBL expanded: confidence scores, custom databases, parallelization and more. Nat Methods 8:367

Showing the most recent 10 out of 112 publications

Comments

Be the first to comment on Steven Salzberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: