Genome-scale sequencing, RNA and protein expression studies, and systematic functional characterization projects have provided a foundation for a new microbiology, supported by predictive analysis of gene, RNA and protein sequences and structures using computers. The parallel development of microbial genome science, bioinformatics, Internet 2.0 and desktop supercomputers has helped bring this revolution to academic, industrial and governmental laboratories worldwide. However, the accurate and efficient management of this flood of new data, using expert curatorial oversight to create reliable information systems supporting experimental and systems biology research, has been an ongoing challenge. EcoGene.org demonstrates that a high impact, reliable bacterial genome database can be constructed with open source tools and maintained with low overhead in an academic research environment. Two broad, long-term objectives drive EcoGene.org development are: (1) the accurate, comprehensive and timely delivery of reliable non-redundant database- unified E. coli K-12 information vetted through an expert gatekeeper and suitable as a foundation for future interdisciplinary research, and (2) the support of other bacterial annotation and research projects through software and methods developed using E. coli as a model system.
The specific aims are: (1) to collect and organize all newly published E. coli research results while controlling the quality of the input data streams, to improve interface functionality, to expand the scope of EcoGene.org to include all E. coli strains, and to provide the public with open source code for EcoGene.org and its generic derivative ProkGene.org;(2) to bring the E. coli K-12 MG1655 Genbank genome information up-to-date on a monthly basis with EcoGene.org acting as a curatorial gateway for feature, function and citation update suggestions from partner databases and the public and to expand the datasets and documentation presented in the E. coli K-12 GenBank record;(3) to perform bioinformatics research (a) to document bioinformatics validation and annotation suites for small proteins, sRNAs, pseudogenes and 5'UTRs, and to create standardized test and training sets, and (b) to add diverse statistical algorithms into a combined pattern search tool;(4) to perform selected laboratory verification studies to (a) resolve annotation gaps and ambiguities in proteome verifications including signal peptide/anchor discriminations, lipoproteins, and translation starts (b) verify the genotypes/phenotypes and fill gaps in large mutant collections, and (c) precisely revert mutations acquired during laboratory maintenance to restore lost functions and phenotypes. The accurate annotation of the E. coli genome is necessary in its own right as the most well understood cellular organism. It is also important to facilitate continued research on E. coli K-12 to increase our understanding of the reference strain most critical for anti-bacterial strategies to defend against bacterial bio-terrorism and to protect against the increasing threat of antibiotic-resistant bacterial epidemics.

Public Health Relevance

EcoGene.org is a web site for scientists interested in the biology of Escherichia coli K-12. Even before genomes began to be sequenced, we knew more about the life of E. coli than any other organism. Now that we have all of the genes for E. coli, we have the possibility of attaining a nearly complete understanding of how a cell works in minute detail. The wealth of information collected at EcoGene.org from all over the world about the life of E. coli, both friend and foe of man, should help in achieving this

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM058560-10
Application #
8324569
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sledjeski, Darren D
Project Start
2009-09-30
Project End
2014-08-31
Budget Start
2012-09-01
Budget End
2014-08-31
Support Year
10
Fiscal Year
2012
Total Cost
$307,409
Indirect Cost
$106,488
Name
University of Miami School of Medicine
Department
Biochemistry
Type
Schools of Medicine
DUNS #
052780918
City
Coral Gables
State
FL
Country
United States
Zip Code
33146
Peña-Soler, Esther; Fernandez, Francisco J; López-Estepa, Miguel et al. (2014) Structural analysis and mutant growth properties reveal distinctive enzymatic and cellular roles for the three major L-alanine transaminases of Escherichia coli. PLoS One 9:e102139
Zhou, Jindan; Richardson, Andrew J; Rudd, Kenneth E (2013) EcoGene-RefSeq: EcoGene tools applied to the RefSeq prokaryotic genomes. Bioinformatics 29:1917-8
Zhou, Jindan; Rudd, Kenneth E (2013) EcoGene 3.0. Nucleic Acids Res 41:D613-24
Basturea, Georgeta N; Dague, Darryl R; Deutscher, Murray P et al. (2012) YhiQ is RsmJ, the methyltransferase responsible for methylation of G1516 in 16S rRNA of E. coli. J Mol Biol 415:16-21
Zhou, Jindan; Rudd, Kenneth E (2011) Bacterial genome reengineering. Methods Mol Biol 765:3-25
Fozo, Elizabeth M; Kawano, Mitsuoki; Fontaine, Fanette et al. (2008) Repression of small toxic protein synthesis by the Sib and OhsC small RNAs. Mol Microbiol 70:1076-93
Hemm, Matthew R; Paul, Brian J; Schneider, Thomas D et al. (2008) Small membrane proteins found by comparative genomics and ribosome binding site models. Mol Microbiol 70:1487-501
Gonnet, Pedro; Rudd, Kenneth E; Lisacek, Frederique (2004) Fine-tuning the prediction of sequences cleaved by signal peptidase II: a curated set of proven and predicted lipoproteins of Escherichia coli K-12. Proteomics 4:1597-613
Shultzaberger, R K; Bucheimer, R E; Rudd, K E et al. (2001) Anatomy of Escherichia coli ribosome binding sites. J Mol Biol 313:215-28
Sarker, S; Rudd, K E; Oliver, D (2000) Revised translation start site for secM defines an atypical signal peptide that regulates Escherichia coli secA expression. J Bacteriol 182:5592-5