Genome sequencing projects have provided a foundation for a new biology centered around the molecular representation of genes and proteins as sequences and structures in computers. The parallel development of genome science, bioinformatics, the Internet and desktop """"""""supercomputers"""""""" has helped bring this revolution to academic, industry and government labs worldwide. Unfortunately, the adverse impact of database errors on experimental science is easily demonstrated. The broad, long-term objective of this proposal is to continue to improve the reliability of the genome and proteome sequences of the model organism Escherichia coli K-12, leading to a Gold Standard Reference Strain for prokaryotic organisms, especially Gram-negative pathogens. This proposal focuses on improving the accuracy of the E. coli genome and proteome.
The specific aims are: (1) to ensure the continued maintenance, improvement and expansion of EcoGene, a primary data repository for the continually revised E. coli genome and proteome sequences, and their annotations; EcoGene also serves as the systematic ORF nomenclature registry for E. coli K-12. EcoGene is part of an annotation-sharing collaboration among the Coli Genetic Stock Center at Yale, the Colibri database at the Pasteur Institute, and SWISS-PROT; (2) to establish two Indexer positions for expert electronic and legacy journal surveillance, to ensure that newly published and pre-released functional data about E. coli genes is entered promptly and accurately into EcoGene, then released to the public, and partner databases; (3) to augment electronic data collection with bioinformatics analysis to (a) discover new evolutionary relationships, thus improving functional predictions and (b) detect-and-report internal and external database errors, including DNA and protein sequence errors, often detected and resolved during analysis-anomaly-refinement-reanalysis (AARR) cycles; and (4) to use laboratory studies to (a) resolve remaining DNA frameshift errors in the E. coli K-12 genome by re-sequencing, (b)verify ambiguous protein starts, and (c) verify the secreted (periplasmic and outer membrane) proteome. The accurate annotation of the E. coli genome is necessary in its own right as the most well-understood cellular organism, and to provide the foundation for the analysis of bacterial genomes whose characterizations will be crucial for the development of biological and chemical defense against bacterial bioterrorism. ? ?
Liang, Wenxing; Rudd, Kenneth E; Deutscher, Murray P (2015) A role for REP sequences in regulating translation. Mol Cell 58:431-9 |
Peña-Soler, Esther; Fernandez, Francisco J; López-Estepa, Miguel et al. (2014) Structural analysis and mutant growth properties reveal distinctive enzymatic and cellular roles for the three major L-alanine transaminases of Escherichia coli. PLoS One 9:e102139 |
Zhou, Jindan; Richardson, Andrew J; Rudd, Kenneth E (2013) EcoGene-RefSeq: EcoGene tools applied to the RefSeq prokaryotic genomes. Bioinformatics 29:1917-8 |
Zhou, Jindan; Rudd, Kenneth E (2013) EcoGene 3.0. Nucleic Acids Res 41:D613-24 |
Basturea, Georgeta N; Dague, Darryl R; Deutscher, Murray P et al. (2012) YhiQ is RsmJ, the methyltransferase responsible for methylation of G1516 in 16S rRNA of E. coli. J Mol Biol 415:16-21 |
Zhou, Jindan; Rudd, Kenneth E (2011) Bacterial genome reengineering. Methods Mol Biol 765:3-25 |
Fozo, Elizabeth M; Kawano, Mitsuoki; Fontaine, Fanette et al. (2008) Repression of small toxic protein synthesis by the Sib and OhsC small RNAs. Mol Microbiol 70:1076-93 |
Hemm, Matthew R; Paul, Brian J; Schneider, Thomas D et al. (2008) Small membrane proteins found by comparative genomics and ribosome binding site models. Mol Microbiol 70:1487-501 |
Gonnet, Pedro; Rudd, Kenneth E; Lisacek, Frederique (2004) Fine-tuning the prediction of sequences cleaved by signal peptidase II: a curated set of proven and predicted lipoproteins of Escherichia coli K-12. Proteomics 4:1597-613 |
Shultzaberger, R K; Bucheimer, R E; Rudd, K E et al. (2001) Anatomy of Escherichia coli ribosome binding sites. J Mol Biol 313:215-28 |
Showing the most recent 10 out of 11 publications