The E. coli genome contains approximately 4000 genes and is currently over 75% sequenced. A complete high resoution restriction map for the entire genome is available. This information has been collected and organized into a cohesive information base, unifying the efforts of many laboratories into a single data resource. This project includes software development, database development, and data analysis. EcoSeq8 contigs, DNA sequence and genomic restriction map data have been compiled and analyzed to determine the number and distribution of genomic restriction sites, repeated patterns in DNA sequences, distribution and categorization of proteins encoded in the genome, assignment of genes to individual clones in the ordered clone set of the E. coli genome, and the detection of putative new genes in the DNA sequence flanking known genes. Several new protein structural motifs have been identified by comparing various E. coli proteins to the protein and DNA sequence databases. Integrated genetic and physical maps have been created for E. coli and its close relative, Salmonella typhimurium. The DNA sequences of Vibrio cholerae have also been organized into a non-redundant database, VchSeq, and ten new Vibrio genes were identified in the flanking DNA of Vibrio cholerae Genbank entries. Comparisons between E. coli, Salmonella, and Vibrio cholerae genomes at the DNA and protein sequence level are underway.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000041-04
Application #
5203625
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
1995
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code