We shall sequence Mycoplasma capricolum which has an 800 kilobase chromosome. This will provide the smallest model genome for a free-living organism capable or growing in a defined medium. Since mycoplasma has only about 500 genes, one can hope to develop a complete understanding of its biology as a result of the genome sequencing. M. capricolum is a pathogen for goats, and its related to a human pathogen, M. pneumoniae; the understanding of the organism will also shed light on its mechanism of infectivity. This organism will be sequenced by a direct technique that does not involve cloning and mapping: multiplex genomic walking, an oligonucleotide-based procedure that reveals the sequence of chromosomal DNA. PCR (polymerase chain reaction) methods will be used to resolve difficult regions and any sequence ambiguities. One round of shotgun cloning and sequencing will be needed to establish about 800 potential initiation points for the walking strategy. A database of information about this organism will be developed that will contain the genomic sequence, all transcription possibilities, all open reading frames, and the identification of most of those reading frames by sequence homologies. The database will contain information about related genes in other organisms and, eventually, will include information about all of the genes of mycoplasma. This database will be distributed on compact discs. This technician-based group specializing in the direct sequencing of microorganisms should achieve a rate of one megabase/year of finished double-stranded sequence by the second year. Sequencing methods will be developed and simplified so that a rate of two to three megabases/year will be achieved by the third year. After completing the mycoplasma sequence, these same direct methods will be applied to sequence a large chromosome from yeast or an other simple eukaryote.
Dolan, M; Ally, A; Purzycki, M S et al. (1995) Large-scale genomic sequencing: optimization of genomic chemical sequencing reactions. Biotechniques 19:264-8, 270-4 |
Smith, S W; Overbeek, R; Woese, C R et al. (1994) The genetic data environment an expandable GUI for multiple sequence analysis. Comput Appl Biosci 10:671-5 |