The goal of this project is to define the capabilities of computer-assisted multiplex sequencing by full scale application to the genomes of two biomedically important organisms: Mycobacterium leprae and M. tuberculosis. These organisms represent a major world health problem, causing more deaths than any other infectious agent. Their genomic DNA sequences will provide an invaluable resource on which rational efforts at disease control can be based. Moreover, combined with sophisticated genetic capabilities, the sequences will provide a unique opportunity to explore the basic biology of these complex pathogens. Each genome will be sequenced on a cosmid by cosmid basis starting with an ordered set of clones. The M. leprae genome will be done first since a mapped set of cosmids already exists. In three years the M. leprae sequence will be finished and work on M. tuberculosis will begin. Through technical improvements, a strong emphasis on quality control, and consistent production efforts, sequencing capabilities exceeding 4 Mb of cosmid sequences per year will be developed in the last 2 years of the project. This throughout will be achieved at a total cost under 50 cents per base. Data will be gathered by digital film scanning. Sequences will be read and assembled into contigs using the REPLICA software package developed by our collaborator, Dr. G. Church and associates at Harvard Medical School. The results will be displayed on a modified GelAssemble platform for rapid, interactive editing with instantaneous access to stored film images. Sequences will be considered """"""""finished"""""""" when both strands have been completely sequenced at an error rate not exceeding 1 in 1000 nucleotides. This level of accuracy will be achieved by random shotgun sequencing to a mean depth of 7-fold redundancy. With this degree of coverage, little or no finishing work will be necessary. However, we are developing a PCR- based method to facilitate any finish sequencing that might need to be done. All of the data we produce will be entered into a mycobacterial sequence database as well as Genbank. Analysis of the sequences for genes, interesting features, regulatory elements, evolutionary conservation, etc. will be done by the mycobacteria research community, assisted by the sequence database and a network of collaborators closely associated with the project.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genome Research Review Committee (GRRC)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Oscient Pharmaceuticals Corporation
United States
Zip Code
Smith, D R; Richterich, P; Rubenfield, M et al. (1997) Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res 7:802-19
Richterich, P; Lakey, N D; Lee, H M et al. (1995) Cytosine specific DNA sequencing with hydrogen peroxide. Nucleic Acids Res 23:4922-3