This project will demonstrate high speed genome sequencing capabilities through the implementation of stat-of-the-art automation into all steps of computer-assisted multiplex sequencing and data analysis. In years 1 and 2, the genomes of two important mycobacterial pathogens, M. tuberculosis and M. leprae, will be completed. The focus will then shift toward systematic sequencing of a 33 Mb region of human chromosome 10 (q24-q26) and a small mouse syntenic region. The sequencing milestones are: 2.4 Mb, 4 Mb, 6.4 Mb, 11.3 Mb and 20 Mb of contiguous finished sequence in years 1 through 5, respectively. A technology development team will focus on the refinement of sequencing techniques to provide routine high quality read-lengths of at least 700 nucleotides. An automation implementation team will test new instrumentation and work closely with the project 2 and the informatics core. Sequence images will be generated by infrared fluorescence scanning on automatic hybridizers developed in project 2. The images will be processed on high-speed computer workstations using automated image analysis software developed by the informatics core and REPLICATM, developed by L. Mintz and G. Church at HHMI at Harvard Medical School. Contigs will be assembled and proofread using GTACTM (developed by G. Gryan and G. Church at HHMI/HMS), GelAssemble, REPLICATM, and powerful new software being developed by the informatics group. Sequence analysis will be carried out using a variety of software tools including Blast, Grail, the Large Sequence Analysis Suite, Mycdb, and programs from the GCG and Staden packages.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Program Projects (P01)
Project #
5P01HG001106-03
Application #
6109110
Study Section
Project Start
1996-07-01
Project End
1999-06-30
Budget Start
Budget End
Support Year
3
Fiscal Year
1996
Total Cost
Indirect Cost
Name
Oscient Pharmaceuticals Corporation
Department
Type
DUNS #
City
Waltham
State
MA
Country
United States
Zip Code
02453
Engelstein, M; Aldredge, T J; Madan, D et al. (1998) An efficient, automatable template preparation for high throughput sequencing. Microb Comp Genomics 3:237-41
Smith, D R; Richterich, P; Rubenfield, M et al. (1997) Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res 7:802-19
Smith, D R (1996) Microbial pathogen genomes--new strategies for identifying therapeutics and vaccine targets. Trends Biotechnol 14:290-3