We propose to develop new algorithms for gene-finding and characterization in prokaryote genomes applied to the discovery of mini-genes in Pseudomonas aeruginosa. Gene-finding algorithms for prokaryote genomes typically use detailed empirical models of gene structure that do not explicitly provide information on the features that identify a coding region. Furthermore, their performance is usually evaluated based on their ability to recognize known or pre-annotated genes, making reliability of individual predictions difficult to evaluate for genes of unusual composition. Complexity of the underlying models and the size of genome annotation-tasks lead to uncharacterized annotation of non-conserved ORFs that may constitute up to 70% of all annotated genes in a genome. This is a significant limitation especially for comparative genomics, where hypothetical genes predicted by different research groups using different methods and criteria must be compared. Hence, gene prediction tools that complement existing methods by emphasizing transparency and compositional characterization should be useful for improving genome annotations, for evaluating the compositional properties of the predicted genes, and for assigning confidence levels to predictions of hypothetical genes. To achieve this goal we propose to develop new gene characterization procedures that explicitly identify and interpret gene compositional properties. Although we focus in this proposal on its application to prokaryotic genomes, the method will also be ideally applicable to the characterization of metagenomics and RNA-seq data, and to fast detection of coding regions in eukaryotic sequences. An agile navigational bioinformatics tool will also be developed to facilitate editing of genome annotations;to identify problem-regions of automated annotations;to obtain measures of significance of the predictions;and to highlight local sequence features such as start codons, ribosomal binding sites, promoter regions, palindromic sequences. We are specifically interested in applying our approach and tools to identify short genes of unusual composition in strains of Pseudomonas aeruginosa, an opportunistic pathogen that is vigorously studied at the University of Florida. The genome of P. aeruginosa is an ideal candidate to apply our method for its high compositional bias and for its high content in uncharacterized hypothetical genes. Unusual regulatory genes have been recently identified in P. aeruginosa that have escaped annotation because of their small size and unusual composition. Many other genes of similar properties may still have to be discovered. Preliminary applications of our method to the genome of P. aeruginosa has provided significant evidence for many newly identified small genes, providing focused targets for experimental verification and evidence of a widespread role of mini-proteins in regulating gene expression. We expect that application of our methods to the study of P. aeruginosa will result in significant improvements in our understanding of the biology of this important pathogen. We expect that the method will be useful in a wide variety of other applications.

Public Health Relevance

The objective of this proposal is to develop gene-characterization tools for prokaryote genomes geared at the identification of genes of unusual composition. The tools will be applied to the identification of short coding sequences ("mini-genes") in Pseudomonas aeruginosa. Our studies are relevant for gene-finding development and for understanding the role of mini-proteins in P. aeruginosa.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Eckstrand, Irene A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Florida
Schools of Medicine
United States
Zip Code
Miranda, Hugo V; Antelmann, Haike; Hepowit, Nathaniel et al. (2014) Archaeal ubiquitin-like SAMP3 is isopeptide-linked to proteins via a UbaA-dependent mechanism. Mol Cell Proteomics 13:220-39
Mukherjee, Krishanu; Brocchieri, Luciano (2013) Ancient Origin of Chaperonin Gene Paralogs Involved in Ciliopathies. J Phylogenetics Evol Biol 1: