The E. coli genome contains over 3000 genes and is currently over 50% sequenced. A complete high resolution restriction map for the entire genome is available. This makes the E. coli genome project the most advanced of all cellular genome projects. This information has been collected and organized into a cohesive information base, unifying the efforts of many laboratories into a single data resource. This project includes software development, database development, and data analysis. The software that has been developed or enhanced during the reporting period include a new genomic sequence viewer and editor, ChromoScope. Two relational databases have been developed: GeneScape, a Macintosh database of genomic map information that is essentially completed, and EC-BASE, a Sybase database of E. coli map and DNA sequence information. Report generators for ECBASE now allow complete ASN.1 and GenBank flatfile reports of EcoSeq contigs, DNA sequence and genomic restriction map data has been analyzed to determine the information content of ribosome binding sites, number and distribution of genomic restriction sites, repeated patterns in DNA sequences, distribution and categorization of proteins encoded in the genome, assignment of genes to individual clones in the ordered clone set of the E. coli genome, and the detection of putative new genes in the DNA sequence flanking known genes. One sequencing gap has been closed, revealing the sequence of two new E. coli genes, gpmA and galM.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000041-02
Application #
3759313
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1993
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code