The research objectives of this proposal are enhanced performance and increased throughput for gel-based automated DNA sequencing instruments in genome sequencing laboratories. We propose to integrate our established basecalling system with upstream data acquisition hardware and software, and with downstream processing and analysis of DNA sequences. We recently determined that the CCD-camera detector of the new ABI 377 DNA sequencer can be externally driven, to bypass its default limits on lateral scanning resolution and spectral bandwidth. We propose to increase scanning from 194 to 600 pixels per 1.5 second traverse of the gel, and to increase sampling from 4 to 64 wavelength intervals (filters) per pixel. Throughput would increase from the default 36 to 96 or more distinguishable sample lanes per gel. Increased detector bandwidth will improve the deconvolution performance for multiple dye sets. This invites applications with more than the standard four dye labels, including alternative fluorescent dye markers. We will investigate deconvolution and pattern recognition multiplex strategies, to determine favorable conditions for sequence analysis with multiple DNA samples per lane. We will evaluate and further develop lane tracking software, to analyze gel images with significantly increased numbers of sample lines. We will use our contextual pattern recognition approach to auto-editing of primary DNA sequence as an engine to generate Bayesian basecall confidence metrics. This will support automated reconciliation of basecalling discrepancies in downstream multiple sequence alignments, and thus facilitate consensus editing as a sequence finishing task. These ABI 377-based studies will be useful models for applications with new sequencing platforms under development. Capillary and microchannel hardware will challenge contemporary basecalling systems, with increased flow of raw imaging data that is required to monitor their accelerated separations and dense arrays of sequencing ladders. This research contributes to cost-effective and accurate large scale DNA sequencing, a significant technical objective of the Human Genome Initiative. This technology is essential in the long term for investigation and critical understanding of the structure, function, and sequence diversity of medically important genes.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000562-07
Application #
2655184
Study Section
Special Emphasis Panel (ZRG2-BIOL-1 (02))
Project Start
1992-02-01
Project End
2000-01-31
Budget Start
1998-02-01
Budget End
1999-01-31
Support Year
7
Fiscal Year
1998
Total Cost
Indirect Cost
Name
George Mason University
Department
Type
Organized Research Units
DUNS #
077817450
City
Fairfax
State
VA
Country
United States
Zip Code
22030
Lauer, Kim P; Llorente, Isabel; Blair, Eric et al. (2004) Natural variation among human adenoviruses: genome sequence and annotation of human adenovirus serotype 1. J Gen Virol 85:2615-25
Benamira, M; Johnson, K; Chaudhary, A et al. (1995) Induction of mutations by replication of malondialdehyde-modified M13 DNA in Escherichia coli: determination of the extent of DNA modification, genetic requirements for mutagenesis, and types of mutations induced. Carcinogenesis 16:93-9
Boylan, K B; Cornblath, D R; Glass, J D et al. (1995) Autosomal dominant distal spinal muscular atrophy in four generations. Neurology 45:699-704
Soares, V M; Brzustowicz, L M; Kleyn, P W et al. (1993) Refinement of the spinal muscular atrophy locus to the interval between D5S435 and MAP1B. Genomics 15:365-71
Golden 3rd, J B; Torgersen, D; Tibbetts, C (1993) Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling. Proc Int Conf Intell Syst Mol Biol 1:136-44
Brzustowicz, L M; Kleyn, P W; Boyce, F M et al. (1992) Fine-mapping of the spinal muscular atrophy locus to a region flanked by MAP1B and D5S6. Genomics 13:991-8