This project will support the continued development and maintenance of four bioinformatics software systems, all of which are used to support research on gene finding and genome annotation. The most widely used of these systems, Glimmer, is used to find genes in bacteria, viruses, archaea, and simple eukaryotes. Glimmer finds over 99% of the genes in most bacteria using a fully automated procedure. A survey of recent literature indicates that Glimmer has been used by the majority of bacterial, viral, and archaeal genome sequencing projects in recent years. This project will enhance Glimmer with several new features, including extensions to enable Glimmer's use on metagenomics data. The second system, MUMmer, is a highly efficient system for whole-genome alignment. In recent years MUMmer has been extended with packages that allow the user to align draft genomes, to align six-frame amino acid translations, and to find all single nucleotide changes between two genomes. This project will maintain and extend the software further, to enable multi-genome alignment and to identify major genomic events (inversions and translocations) more automatically. The third system, TransTerm, finds rho-independent transcription terminators in bacterial and archaeal genomes. TransTerm comprises a software package and a website containing predictions for hundreds of genomes, all of which are freely downloadable. This project will extend the software to search for anti-terminators, and to allow it to operate on incomplete genomes, which already outnumber finished genomes and are continuing to grow rapidly. The fourth system, OperonDB, includes a database and a software system that identifies operons in a collection of prokaryotic genomes using conserved synteny across species. This project will support enhancements to the software and regular updates to OperonDB. All of the software and data generated by this project will continue to be made available for free under an open source license, allowing unrestricted use by other researchers to add new functions or to wire the software into their own systems. Project Narrative (Relevance) This project supports a suite of software packages that have been extensively used in the interpretation and analysis of many pathogenic organisms, including the bacteria that cause tuberculosis, cholera, anthrax, strep and staph infections, Lyme disease, syphilis, and many other diseases. Ongoing development of this software will be of fundamental important in continuing research on these diseases, and new developments proposed here will be necessary to address the new challenges that will emerge from efforts to sequence the diverse bacteria that live in the human body.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer
Anderson, James J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Maryland College Park
Biostatistics & Other Math Sci
Schools of Arts and Sciences
College Park
United States
Zip Code
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W et al. (2017) An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. Gigascience 6:1-4
Pertea, Mihaela; Kim, Daehwan; Pertea, Geo M et al. (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650-67
Kim, Daehwan; Song, Li; Breitwieser, Florian P et al. (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721-1729
Breitwieser, Florian P; Pardo, Carlos A; Salzberg, Steven L (2015) Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection. F1000Res 4:180
Pop, Mihai; Salzberg, Steven L (2015) Use and mis-use of supplementary material in science publications. BMC Bioinformatics 16:237
Martinson, Vincent G; Magoc, Tanja; Koch, Hauke et al. (2014) Genomic features of a bumble bee symbiont reflect its host environment. Appl Environ Microbiol 80:3793-803
Wood, Derrick E; Salzberg, Steven L (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46
Merchant, Samier; Wood, Derrick E; Salzberg, Steven L (2014) Unexpected cross-species contamination in genome sequencing projects. PeerJ 2:e675
Salzberg, Steven L; Pertea, Mihaela; Fahrner, Jill A et al. (2014) DIAMUND: direct comparison of genomes to detect mutations. Hum Mutat 35:283-8
Magoc, Tanja; Wood, Derrick; Salzberg, Steven L (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9:127-36

Showing the most recent 10 out of 71 publications