We propose an approach for large scale DNA sequencing which will enable completion of a sequence map for the entire human genome by the year 2002. Our effort will begin with a three year pilot in which 50, 100 and 200 megabases (Mb) will be completed in each year, respectively. From this stating point, we envision an additional three year effort in St. Louis which will result in a cumulative total of one billion bases of the genome sequence. When combined with the results from a collaborative effort by John Sulston and his colleagues at the Sanger Center, over two- thirds of the genome sequence will be completed in six years. The essence of our approach will be to utilize the methods with which we have completed over 20 Mb of the C. elegans genome in the past two years. However, we will reduce the stringency which we currently apply to small gaps and other minor sequence ambiguities, focussing instead on the construction of a sequence map of 99% contiguity and 99.9% accuracy. Coding segments and other functionally important sequences will be identified through database similarities and gene prediction programs and will be finished to full contiguity and 99.99% accuracy. The sequence will be annotated with genes and other features and submitted to the public databases as rapidly as possible. To accomplish these ambitious goals, we propose three projects, supported by three cores. The Mapping Project will construct bacterial clone maps of targeted chromosomes and supply minimally overlapping clones to the Sequencing Project. This, in turn, will be responsible for generating the sequence itself. Working with the Informatics Project, functionally important regions will be identified and finished manually. The Informatics Project will then be responsible for annotating and submitting the completed sequence. The Informatics Project and Technology Development Core will continue to improve the mapping and sequencing process. The day-to-day operations of the entire Center will be supported by the Administrative and Materials Cores. The concerted efforts of all these projects and cores will be required to reach our ambitious goals.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Specialized Center (P50)
Project #
5P50HG001458-02
Application #
2392522
Study Section
Special Emphasis Panel (SRC (01))
Project Start
1996-04-11
Project End
1999-06-30
Budget Start
1997-04-01
Budget End
1998-06-30
Support Year
2
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
062761671
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Korf, I; Gish, W (2000) MPBLAST : improved BLAST performance with multiplexed queries. Bioinformatics 16:1052-3
Bedell, J A; Korf, I; Gish, W (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16:1040-1
Marth, G T; Korf, I; Yandell, M D et al. (1999) A general approach to single-nucleotide polymorphism discovery. Nat Genet 23:452-6
Wendl, M C; Dear, S; Hodgson, D et al. (1998) Automated sequence preprocessing in a large-scale sequencing environment. Genome Res 8:975-84
Panussis, D A; Cook, M W; Rifkin, L L et al. (1998) A pneumatic device for rapid loading of DNA sequencing gels. Genome Res 8:543-8
Stein, L D; Thierry-Mieg, J (1998) Scriptable access to the Caenorhabditis elegans genome sequence and other ACEDB databases. Genome Res 8:1308-15
Marra, M A; Kucaba, T A; Dietrich, N L et al. (1997) High throughput fingerprint analysis of large-insert clones. Genome Res 7:1072-84