This project will support the continued development and maintenance of four bioinformatics software systems, all of which are used to support research on gene finding and genome annotation. The most widely used of these systems, Glimmer, is used to find genes in bacteria, viruses, archaea, and simple eukaryotes. Glimmer finds over 99% of the genes in most bacteria using a fully automated procedure. A survey of recent literature indicates that Glimmer has been used by the majority of bacterial, viral, and archaeal genome sequencing projects in recent years. This project will enhance Glimmer with several new features, including extensions to enable Glimmer's use on metagenomics data. The second system, MUMmer, is a highly efficient system for whole-genome alignment. In recent years MUMmer has been extended with packages that allow the user to align draft genomes, to align six-frame amino acid translations, and to find all single nucleotide changes between two genomes. This project will maintain and extend the software further, to enable multi-genome alignment and to identify major genomic events (inversions and translocations) more automatically. The third system, TransTerm, finds rho-independent transcription terminators in bacterial and archaeal genomes. TransTerm comprises a software package and a website containing predictions for hundreds of genomes, all of which are freely downloadable. This project will extend the software to search for anti-terminators, and to allow it to operate on incomplete genomes, which already outnumber finished genomes and are continuing to grow rapidly. The fourth system, OperonDB, includes a database and a software system that identifies operons in a collection of prokaryotic genomes using conserved synteny across species. This project will support enhancements to the software and regular updates to OperonDB. All of the software and data generated by this project will continue to be made available for free under an open source license, allowing unrestricted use by other researchers to add new functions or to wire the software into their own systems. Project Narrative (Relevance) This project supports a suite of software packages that have been extensively used in the interpretation and analysis of many pathogenic organisms, including the bacteria that cause tuberculosis, cholera, anthrax, strep and staph infections, Lyme disease, syphilis, and many other diseases. Ongoing development of this software will be of fundamental important in continuing research on these diseases, and new developments proposed here will be necessary to address the new challenges that will emerge from efforts to sequence the diverse bacteria that live in the human body.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM083873-07
Application #
7779520
Study Section
Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer
Anderson, James J
Project Start
2008-03-25
Project End
2012-02-29
Budget Start
2010-03-01
Budget End
2011-02-28
Support Year
7
Fiscal Year
2010
Total Cost
$273,983
Indirect Cost
Name
University of Maryland College Park
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
790934285
City
College Park
State
MD
Country
United States
Zip Code
20742
Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer et al. (2018) Identifying Corneal Infections in Formalin-Fixed Specimens Using Next Generation Sequencing. Invest Ophthalmol Vis Sci 59:280-288
Pertea, Mihaela; Shumate, Alaina; Pertea, Geo et al. (2018) CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol 19:208
Breitwieser, F P; Baker, D N; Salzberg, S L (2018) KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol 19:198
Lu, Jennifer; Salzberg, Steven L (2018) Removing contaminants from databases of draft genomes. PLoS Comput Biol 14:e1006277
Luo, Ruibang; Zimin, Aleksey; Workman, Rachael et al. (2017) First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (Formerly Scedosporium prolificans). G3 (Bethesda) 7:3831-3836
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W et al. (2017) An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. Gigascience 6:1-4
Pertea, Mihaela; Kim, Daehwan; Pertea, Geo M et al. (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650-67
Kim, Daehwan; Song, Li; Breitwieser, Florian P et al. (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721-1729
Breitwieser, Florian P; Pardo, Carlos A; Salzberg, Steven L (2015) Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection. F1000Res 4:180
Pop, Mihai; Salzberg, Steven L (2015) Use and mis-use of supplementary material in science publications. BMC Bioinformatics 16:237

Showing the most recent 10 out of 76 publications