Computational Gene Modeling and Genome Sequence Assembly

Salzberg, Steven

Abstract

This project addresses two major bioinformatics problems: the development of better software for finding genes in eukaryotic genome sequences, and the development of genome assemblers for large shotgun sequencing projects. The gene finding project will pursue two tracks: first, we will continue to improve our Generalized Hidden Markov Model and our Pair Hidden Markov Model gene finders, training them for new species as new genomes appear, and enhancing their capabilities to use related species as a guide to gene finding in a new species. Second, we will develop a new eukaryotic annotation pipeline, which will integrate the results from a wide range of sources, including gene finders, protein sequence alignments, cDNA and EST alignments, and other sequence features. This pipeline will be used to predict comprehensive gene sets for multiple species, focusing especially on species for which the available annotation is incomplete or outdated. The pipeline will also be available as a service to annotate genomes for other groups. The assembler project will include several major efforts. First, we will continue to build on our successful open source assembler project, AMOS, adding new modules to allow inter-operation with other assembly packages. Second, we will develop new assemblers that can handle pyrosequencing data, low coverage genome projects, and sequences collected from complex mixtures of species. Third, we will provide assembly services to genome sequencing centers and other collaborators, helping them to assemble genomes using the latest available assembly tools. These will include new sequencing projects as well as genomes that, although already sequenced, can be re-assembled more accurately using improved assembly software. For all of the software development projects, we will continue our practice of making all our source code freely available to investigators in the scientific research community worldwide.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 3R01LM006845-10S1
Application #: 7864735
Study Section: Special Emphasis Panel (ZLM1-ZH-S (J2))
Program Officer: Ye, Jane

Project Start: 2009-08-01
Project End: 2011-06-30
Budget Start: 2011-07-01
Budget End: 2012-05-31
Support Year: 10
Fiscal Year: 2009
Total Cost: $140,994
Indirect Cost

Institution

Name: University of Maryland College Park
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 790934285

City: College Park
State: MD
Country: United States
Zip Code: 20742

Related projects

Publications

Magoc, Tanja; Wood, Derrick; Salzberg, Steven L (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9:127-36

Schatz, Michael C; Phillippy, Adam M; Sommer, Daniel D et al. (2013) Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform 14:213-24

Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67

Stein, Daniel C; Miller, Clinton J; Bhoopalan, Senthil V et al. (2011) Sequence-based predictions of lipooligosaccharide diversity in the Neisseriaceae and their implication in pathogenicity. PLoS One 6:e18923

Rasko, David A; Worsham, Patricia L; Abshire, Terry G et al. (2011) Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation. Proc Natl Acad Sci U S A 108:5027-32

Bogdanove, Adam J; Koebnik, Ralf; Lu, Hong et al. (2011) Two new complete genome sequences offer insight into host and tissue specificity of plant pathogenic Xanthomonas spp. J Bacteriol 193:5450-64

Pertea, Mihaela; Pertea, Geo M; Salzberg, Steven L (2011) Detection of lineage-specific evolutionary changes among primate species. BMC Bioinformatics 12:274

Walenz, Brian; Florea, Liliana (2011) Sim4db and Leaff: utilities for fast batch spliced alignment and sequence indexing. Bioinformatics 27:1869-70

Lipman, David; Flicek, Paul; Salzberg, Steven et al. (2011) Closure of the NCBI SRA and implications for the long-term future of genomics data storage. Genome Biol 12:402

Mago?, Tanja; Salzberg, Steven L (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27:2957-63

Showing the most recent 10 out of 112 publications

Comments

Be the first to comment on Steven Salzberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: