We shall sequence Mycoplasma capricolum which has an 800 kilobase chromosome. This will provide the smallest model genome for a free-living organism capable or growing in a defined medium. Since mycoplasma has only about 500 genes, one can hope to develop a complete understanding of its biology as a result of the genome sequencing. M. capricolum is a pathogen for goats, and its related to a human pathogen, M. pneumoniae; the understanding of the organism will also shed light on its mechanism of infectivity. This organism will be sequenced by a direct technique that does not involve cloning and mapping: multiplex genomic walking, an oligonucleotide-based procedure that reveals the sequence of chromosomal DNA. PCR (polymerase chain reaction) methods will be used to resolve difficult regions and any sequence ambiguities. One round of shotgun cloning and sequencing will be needed to establish about 800 potential initiation points for the walking strategy. A database of information about this organism will be developed that will contain the genomic sequence, all transcription possibilities, all open reading frames, and the identification of most of those reading frames by sequence homologies. The database will contain information about related genes in other organisms and, eventually, will include information about all of the genes of mycoplasma. This database will be distributed on compact discs. This technician-based group specializing in the direct sequencing of microorganisms should achieve a rate of one megabase/year of finished double-stranded sequence by the second year. Sequencing methods will be developed and simplified so that a rate of two to three megabases/year will be achieved by the third year. After completing the mycoplasma sequence, these same direct methods will be applied to sequence a large chromosome from yeast or an other simple eukaryote.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000124-02
Application #
3333154
Study Section
Special Emphasis Panel (SSS (S))
Project Start
1990-08-01
Project End
1993-07-31
Budget Start
1991-08-01
Budget End
1992-07-31
Support Year
2
Fiscal Year
1991
Total Cost
Indirect Cost
Name
Harvard University
Department
Type
Schools of Arts and Sciences
DUNS #
071723621
City
Cambridge
State
MA
Country
United States
Zip Code
02138
Dolan, M; Ally, A; Purzycki, M S et al. (1995) Large-scale genomic sequencing: optimization of genomic chemical sequencing reactions. Biotechniques 19:264-8, 270-4
Smith, S W; Overbeek, R; Woese, C R et al. (1994) The genetic data environment an expandable GUI for multiple sequence analysis. Comput Appl Biosci 10:671-5