9417897 Neidhardt For the past 15 years this laboratory has been assembling a gene-protein database for the bacterium Escherichia coli. There studies were made possible by the ability of two-dimensional (2D) polyacrylamide gel electrophoresis to present a more or less complete picture of the cell's complement of proteins. The ultimate goal of this project is to trace every E. Coli protein to its structural gene, and to account for every gene of the cell. The 5th edition of the Gene-Protein Database contains just under 700 entries, accounting for one-sixth of the protein-encoding genes of E. Coli. Most of these entries derive from methods (comigration with purified marker proteins, defective or overproducing mutants, etc.) that allow one-at-a-time identifications of protein spots on 20 gels as known proteins or the products of known genes. Early progress was rapid, but even so, many decades would be necessary to identify even the products of known genes, and with no hope of identifying all gene products by this approach. The recent congruence of three technical advances means that it is now technically possible for a small research group, with the right tools, to elucidate the complete complement of the genes and gene products of E. coli in a reasonable length of time. The three technical advances are (i) the Kohara library of recombinant phage carrying EcoR1-bordered fragments of the whole E. coli chromosome, (ii) the means to express selectively and totally all the genes from such segments, and (iii) the advanced state of sequencing of the E. coli chromosome. The research plan begins with the removal, one by one, of the chromosomal inserts from the Kohara phage library, and splicing them into one or another plasmid engineered to permit complete transcription of both strands of the chromosomal segment. These plasmids contain promoter sequences from phage T7 and phage T3 in opposite orientations. This transcription will occur in a host cell that can be prevented from transcribing its own chr omosomal genes and can be induced to form either the T7 or the T3 RNA polymerase. Proteins made from these transcripts can be radioactively labeled and thereby distinguished from the other cellular proteins. The labeled proteins expressed from the chromosomal fragment can then be resolved on 2D gels and assigned coordinates on the reference gel image for this organism. Each expressed protein spot can then be matched to its encoding DNA on the segment by comparing the mol. wgt. and the isoelectric point of the protein (as measured by its gel location) with those predicted by the nucleotide sequence of the segment. In this way a Genome Expression Map will gradually be produced that displays the 2D gel location of every protein encoded by the E. Coli genome, and matches each protein spot to its gene. The Genome Expression Map will be published periodically, and maintained as a frequently updated, publicly accessible, electronic database. %%% The goal of this project is to trace every protein in the bacterium Escherichia coli to is structure gene, and to account for every gene of the cell. Most of the nearly 700 entries in the current gene-protein database derive from methods that allow one-at-a-time identifications of protein spots on two dimensional (2D) gels as known proteins or the products of known genes. Early progress was rapid, but even so, many decades would be necessary to identify even the products of known genes, and with no hope of identifying all gene products by this approach. The recent congruence of three technical advances means that it is now technically possible for a small research group, with the right tools, to produce a Genome Expression Map that displays the 2D gel location of every protein encoded by the E. Coli genome, and matches each protein spot to its gene. In addition, each protein of the cell will be sorted into general metabolic classes as they are identified. The Genome Expression Map and the annotated physiological information about the cellular protei ns will be published periodically, and maintained as a frequently updated, publicity accessible, electronic database. This database will greatly facilitate studies of global gene expression patterns and will be a model for work with more complex organisms. ***

Agency
National Science Foundation (NSF)
Institute
Division of Molecular and Cellular Biosciences (MCB)
Application #
9417897
Program Officer
Susan Porter Ridley
Project Start
Project End
Budget Start
1995-01-15
Budget End
1999-06-30
Support Year
Fiscal Year
1994
Total Cost
$807,700
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109