Computational Gene Modeling and Genome Sequence Assembly

Salzberg, Steven

Abstract

This project will address two major bioinformatics problems: the development of new and improved software for finding genes in eukaryotic genome sequences, and the development of a sequence assembler that is capable of assembling very large genomes. The gene finding project will pursue three tracks: first, we will improve our existing eukaryotic gene finding system, GlimmerM, adding the ability to recognize new sequence patterns and enhancing the ease with which the system can be adapted to new organisms. Second, we will develop a new gene finder, based on Pair Hidden Markov Models (PHMMs), which will use the sequence similarity between two related organisms to find genes in both species simultaneously. Third, we will develop a system for integrating the output from multiple gene finders and from sequence alignment programs in order to produce gene models that incorporate all available evidence. The assembler project will include the development of several major components. The overall goal is to build a sequence assembler that will be able to assemble data from whole-genome shotgun sequencing projects for genomes ranging from a few million base pairs up to billions of base pairs. The assembler will have the ability to accept as input both raw sequencing reads and a mixture of reads and already-assembled sequences. A separate scaffold-building program will create larger scaffolds from a set of assemblies by using information from paired-end sequences. In addition, this project will develop and distribute a genome assembler benchmark set, containing sequences from shotgun sequencing projects for which the correct assembly is known. For all of the software development projects, the source code will be made freely available to investigators in the scientific research community worldwide.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM006845-05
Application #: 6663116
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 1999-09-01
Project End: 2006-09-29
Budget Start: 2003-09-30
Budget End: 2004-09-29
Support Year: 5
Fiscal Year: 2003
Total Cost: $652,539
Indirect Cost

Institution

Name: Institute for Genomic Research
Department
Type
DUNS #: 795140805

City: Rockville
State: MD
Country: United States
Zip Code: 20850

Related projects

Publications

Magoc, Tanja; Wood, Derrick; Salzberg, Steven L (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9:127-36

Schatz, Michael C; Phillippy, Adam M; Sommer, Daniel D et al. (2013) Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform 14:213-24

Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67

Stein, Daniel C; Miller, Clinton J; Bhoopalan, Senthil V et al. (2011) Sequence-based predictions of lipooligosaccharide diversity in the Neisseriaceae and their implication in pathogenicity. PLoS One 6:e18923

Rasko, David A; Worsham, Patricia L; Abshire, Terry G et al. (2011) Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation. Proc Natl Acad Sci U S A 108:5027-32

Bogdanove, Adam J; Koebnik, Ralf; Lu, Hong et al. (2011) Two new complete genome sequences offer insight into host and tissue specificity of plant pathogenic Xanthomonas spp. J Bacteriol 193:5450-64

Pertea, Mihaela; Pertea, Geo M; Salzberg, Steven L (2011) Detection of lineage-specific evolutionary changes among primate species. BMC Bioinformatics 12:274

Walenz, Brian; Florea, Liliana (2011) Sim4db and Leaff: utilities for fast batch spliced alignment and sequence indexing. Bioinformatics 27:1869-70

Lipman, David; Flicek, Paul; Salzberg, Steven et al. (2011) Closure of the NCBI SRA and implications for the long-term future of genomics data storage. Genome Biol 12:402

Mago?, Tanja; Salzberg, Steven L (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957-63

Showing the most recent 10 out of 112 publications

Comments

Be the first to comment on Steven Salzberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: