Bioinformatics Software for Analyzing Microbial Genomes

Salzberg, Steven

Abstract

This project will support the continued development and maintenance of four bioinformatics software systems that are widely used in research on gene finding and genome annotation. The first of these, Glimmer, is used to find genes in bacteria, viruses, archaea, and simple eukaryotes. Glimmer is highly accurate, finding over 99% of the genes in most bacteria. It has been used by thousands of scientists around the world, including the majority of published bacterial and archival genome sequencing projects over the past decade. Collectively the three main publications describing Glimmer have been cited over 2,600 times, including 400 citations in 2012 alone. Usage of Glimmer has increased in recent years due to the explosion in next-generation sequencing projects, which are particularly cost-effective for bacterial genomes. Our very recent introduction of a new version of Glimmer customized for met genomics data is intended to make it available to microbiome researchers. Glimmer's algorithm is also the basis of PhymmBL, a new system for classifying sequences from metagenomics projects, which we will also support under this project. The second system, MUMmer, is a highly efficient system for whole-genome alignment that is widely used to compare bacterial genomes to one another and to compare genome assemblies to detect changes, both large and small. MUMmer and its components, especially Nucmer, have been widely used and have been incorporated in many other systems, including a recent multi-genome aligner, Mugsy, and several genome assembly packages. The three main publications describing MUMmer have been cited over 1,900 times including 200 citations in 2012. A major reason for the recent increase in usage of these systems, beyond the drop in sequencing costs, is the growth of metagenomics research, particularly the human microbiome project. This project will also support two other systems, TransTermHP and OperonDB, and the web databases that accompany them. TransTermHP finds transcription terminators in bacterial and archaeal genomes, and we have used it to build a website containing predictions for over 1500 genomes, all of which are freely downloadable. OperonDB includes a database and a software system that identifies operons in a collection of prokaryotic genomes using conserved synteny across species. Each of these systems have been widely used and cited, and this project requests funding to rebuild the databases on a larger collection of genomes and to continue to expand them as more genomes appear. All of the software and data generated by this project will continue to be freely available under an open source license, allowing unrestricted use by other researchers to use, modify, and redistribute them without restrictions of any kind.

Public Health Relevance

This project supports a suite of software packages that have been extensively used in the interpretation and analysis of many pathogenic organisms, including the bacteria that cause tuberculosis, cholera, anthrax, strep and staph infections, Lyme disease, syphilis, and many others. Ongoing support and development of this software will be essential in continuing research on these diseases, and also for the new challenges likely to emerge from efforts to sequence the diverse bacteria that live in the human body.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 2R01GM083873-10
Application #: 8637538
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Lyster, Peter

Project Start: 2008-03-25
Project End: 2018-01-31
Budget Start: 2014-04-01
Budget End: 2015-01-31
Support Year: 10
Fiscal Year: 2014
Total Cost: $243,000
Indirect Cost: $93,000

Institution

Name: Johns Hopkins University
Department: Genetics
Type: Schools of Medicine
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects


NIH 2017 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / Johns Hopkins University	$218,700
NIH 2016 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / Johns Hopkins University
NIH 2015 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / Johns Hopkins University	$243,000
NIH 2014 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / Johns Hopkins University	$243,000
NIH 2011 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / University of Maryland College Park	$52,161
NIH 2011 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / Johns Hopkins University	$236,609
NIH 2010 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / University of Maryland College Park	$273,983
NIH 2009 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / University of Maryland College Park	$276,750
NIH 2008 R01 GM	Bioinformatics Software for Analyzing Microbial Genomes Salzberg, Steven L. / University of Maryland College Park	$276,750

Publications

Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer et al. (2018) Identifying Corneal Infections in Formalin-Fixed Specimens Using Next Generation Sequencing. Invest Ophthalmol Vis Sci 59:280-288

Pertea, Mihaela; Shumate, Alaina; Pertea, Geo et al. (2018) CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol 19:208

Breitwieser, F P; Baker, D N; Salzberg, S L (2018) KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol 19:198

Lu, Jennifer; Salzberg, Steven L (2018) Removing contaminants from databases of draft genomes. PLoS Comput Biol 14:e1006277

Luo, Ruibang; Zimin, Aleksey; Workman, Rachael et al. (2017) First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (Formerly Scedosporium prolificans). G3 (Bethesda) 7:3831-3836

Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W et al. (2017) An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. Gigascience 6:1-4

Pertea, Mihaela; Kim, Daehwan; Pertea, Geo M et al. (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650-67

Kim, Daehwan; Song, Li; Breitwieser, Florian P et al. (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721-1729

Breitwieser, Florian P; Pardo, Carlos A; Salzberg, Steven L (2015) Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection. F1000Res 4:180

Pop, Mihai; Salzberg, Steven L (2015) Use and mis-use of supplementary material in science publications. BMC Bioinformatics 16:237

Showing the most recent 10 out of 76 publications

Comments

Be the first to comment on Steven Salzberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: