We will determine the complete nucleotide sequence of the Escherichia coli chromosome and study and analyze the complete sequence. We will appraise, develop and implement improved sequence gathering methodology; explore and develop various sequencing strategies; and use and develop analysis programs and algorithms. In the past decade the DNA sequences of virus and organelle genomes of increasing size and complexity have been determined. The largest genome to have been sequenced to date is that of Epstein-Barr virus (182 kilo base pairs). We feel that the total sequencing of a free living life-form such as Escherichia coli (5 million base pairs) would be an appropriate next step that will be both technically feasible and scientifically rewarding. The complete sequence will provide a unique opportunity to analyze physical, genetic and organizational features of the whole genome. We will be able to make global statements about the genome's physical structure, its size, base content and distribution, (frequency and size of direct and inverted repeats, and the locations of potential loops, bends or Z-DNA. At the level of genetic organization, we will look for families of related genes and analyze their distribution in the genome. Besides developing a resource of biological information of inestimatable value in its own right, the sequencing techniques and methodology and the programs and algorithms for analysis developed in this project will have important applications for other large sequencing projects.