This project will support the continued development and maintenance of four bioinformatics systems, all of which are used for microbial genomics research. The most widely used of these systems, Glimmer, is used to find genes in bacteria, viruses, archaea, and simple eukaryotes. It can find over 99% of the genes in bacteria fully automatically, and it has been used as part of dozens of genome annotation efforts. The system has been distributed (free, including source code) to over 1400 academic and government laboratories and institutions. This project will support these users with continued improvements that include new features to permit Glimmer's use on incomplete genomes, improved detection of start codons, and a more user-friendly interface. The second system, PANDA, is a new system for creating non-redundant protein sequence databases, which are a key tool in genome sequence analysis. PANDA is an important resource for both prokaryotic and eukaryotic genomics research. This project will support the creation and regular updates of a comprehensive database containing proteins from all species, a specialized database of bacterial proteins, a database of mammalian proteins, and others. All databases will be freely available for download and will be regularly rebuilt with the latest genome data. The third system, TransTerm, finds transcription terminators in microbial genomes. TransTerm has been distributed for free to over 500 laboratories, and it will be extended to find new types of terminators and to recognize anti-terminators. This project will also support the maintenance of a website that contains all terminators from the latest set of completed genomes. The fourth system identifies operons in microbial genomes, using conserved synteny across species as the basis for its predictions. This project will support enhancements to the software and regular updates to the operon database, which needs to be modified to incorporate new genomes as they appear. Both the software and the operon database will be freely available to the scientific community.
Phillippy, Adam M; Schatz, Michael C; Pop, Mihai (2008) Genome assembly forensics: finding the elusive mis-assembly. Genome Biol 9:R55 |
Delcher, Arthur L; Bratke, Kirsten A; Powers, Edwin C et al. (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673-9 |
Kingsford, Carl; Delcher, Arthur L; Salzberg, Steven L (2007) A unified model explaining the offsets of overlapping and near-overlapping prokaryotic genes. Mol Biol Evol 24:2091-8 |
Ghedin, Elodie; Wang, Shiliang; Spiro, David et al. (2007) Draft genome of the filarial nematode parasite Brugia malayi. Science 317:1756-60 |
Sommer, Daniel D; Delcher, Arthur L; Salzberg, Steven L et al. (2007) Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8:64 |
Kingsford, Carleton L; Ayanbule, Kunmi; Salzberg, Steven L (2007) Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol 8:R22 |
Carlton, Jane M; Hirt, Robert P; Silva, Joana C et al. (2007) Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315:207-12 |
Salzberg, Steven L; Kingsford, Carl; Cattoli, Giovanni et al. (2007) Genome analysis linking recent European and African influenza (H5N1) viruses. Emerg Infect Dis 13:713-8 |
Salzberg, Steven L (2007) Genome re-annotation: a wiki solution? Genome Biol 8:102 |
Pertea, Mihaela; Mount, Stephen M; Salzberg, Steven L (2007) A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. BMC Bioinformatics 8:159 |
Showing the most recent 10 out of 21 publications