This project will support the continued development and maintenance of four bioinformatics systems, all of which are used for microbial genomics research. The most widely used of these systems, Glimmer, is used to find genes in bacteria, viruses, archaea, and simple eukaryotes. It can find over 99% of the genes in bacteria fully automatically, and it has been used as part of dozens of genome annotation efforts. The system has been distributed (free, including source code) to over 1400 academic and government laboratories and institutions. This project will support these users with continued improvements that include new features to permit Glimmer's use on incomplete genomes, improved detection of start codons, and a more user-friendly interface. The second system, PANDA, is a new system for creating non-redundant protein sequence databases, which are a key tool in genome sequence analysis. PANDA is an important resource for both prokaryotic and eukaryotic genomics research. This project will support the creation and regular updates of a comprehensive database containing proteins from all species, a specialized database of bacterial proteins, a database of mammalian proteins, and others. All databases will be freely available for download and will be regularly rebuilt with the latest genome data. The third system, TransTerm, finds transcription terminators in microbial genomes. TransTerm has been distributed for free to over 500 laboratories, and it will be extended to find new types of terminators and to recognize anti-terminators. This project will also support the maintenance of a website that contains all terminators from the latest set of completed genomes. The fourth system identifies operons in microbial genomes, using conserved synteny across species as the basis for its predictions. This project will support enhancements to the software and regular updates to the operon database, which needs to be modified to incorporate new genomes as they appear. Both the software and the operon database will be freely available to the scientific community.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM007938-04
Application #
7126512
Study Section
Genome Study Section (GNM)
Program Officer
Ye, Jane
Project Start
2003-09-01
Project End
2008-02-29
Budget Start
2006-09-01
Budget End
2008-02-29
Support Year
4
Fiscal Year
2006
Total Cost
$181,264
Indirect Cost
Name
University of Maryland College Park
Department
Type
Organized Research Units
DUNS #
790934285
City
College Park
State
MD
Country
United States
Zip Code
20742
Phillippy, Adam M; Schatz, Michael C; Pop, Mihai (2008) Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 9:R55
Delcher, Arthur L; Bratke, Kirsten A; Powers, Edwin C et al. (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673-9
Kingsford, Carl; Delcher, Arthur L; Salzberg, Steven L (2007) A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. Mol Biol Evol 24:2091-8
Ghedin, Elodie; Wang, Shiliang; Spiro, David et al. (2007) Draft genome of the filarial nematode parasite Brugia malayi. Science 317:1756-60
Sommer, Daniel D; Delcher, Arthur L; Salzberg, Steven L et al. (2007) Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8:64
Kingsford, Carleton L; Ayanbule, Kunmi; Salzberg, Steven L (2007) Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol 8:R22
Carlton, Jane M; Hirt, Robert P; Silva, Joana C et al. (2007) Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315:207-12
Salzberg, Steven L; Kingsford, Carl; Cattoli, Giovanni et al. (2007) Genome analysis linking recent European and African influenza (H5N1) viruses. Emerg Infect Dis 13:713-8
Salzberg, Steven L (2007) Genome re-annotation: a wiki solution? Genome Biol 8:102
Pertea, Mihaela; Mount, Stephen M; Salzberg, Steven L (2007) A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. BMC Bioinformatics 8:159

Showing the most recent 10 out of 21 publications