This project will support the continued development and maintenance of four bioinformatics software systems, all of which are used to support research on gene finding and genome annotation. The most widely used of these systems, Glimmer, is used to find genes in bacteria, viruses, archaea, and simple eukaryotes. Glimmer finds over 99% of the genes in most bacteria using a fully automated procedure. A survey of recent literature indicates that Glimmer has been used by the majority of bacterial, viral, and archaeal genome sequencing projects in recent years. This project will enhance Glimmer with several new features, including extensions to enable Glimmer's use on metagenomics data. The second system, MUMmer, is a highly efficient system for whole-genome alignment. In recent years MUMmer has been extended with packages that allow the user to align draft genomes, to align six-frame amino acid translations, and to find all single nucleotide changes between two genomes. This project will maintain and extend the software further, to enable multi-genome alignment and to identify major genomic events (inversions and translocations) more automatically. The third system, TransTerm, finds rho-independent transcription terminators in bacterial and archaeal genomes. TransTerm comprises a software package and a website containing predictions for hundreds of genomes, all of which are freely downloadable. This project will extend the software to search for anti-terminators, and to allow it to operate on incomplete genomes, which already outnumber finished genomes and are continuing to grow rapidly. The fourth system, OperonDB, includes a database and a software system that identifies operons in a collection of prokaryotic genomes using conserved synteny across species. This project will support enhancements to the software and regular updates to OperonDB. All of the software and data generated by this project will continue to be made available for free under an open source license, allowing unrestricted use by other researchers to add new functions or to wire the software into their own systems. Project Narrative (Relevance) This project supports a suite of software packages that have been extensively used in the interpretation and analysis of many pathogenic organisms, including the bacteria that cause tuberculosis, cholera, anthrax, strep and staph infections, Lyme disease, syphilis, and many other diseases. Ongoing development of this software will be of fundamental important in continuing research on these diseases, and new developments proposed here will be necessary to address the new challenges that will emerge from efforts to sequence the diverse bacteria that live in the human body.
Showing the most recent 10 out of 76 publications