The objective of this proposal is to significantly improve automated determination of DNA sequences. Practical performance limits of automated DNA sequencers are determined by the separation of oligonucleotides effected by polyacrylamide gel electrophoresis. Designs of contemporary instruments are basically similar. As oligomers in a DNA sequencing ladder pass the detector(s), multi-component analysis specifies the radioactive or fluorescent label associated with each oligomer. Under ideal conditions, determination of the sequence of terminal nucleotides is straightforward. When separations of oligomers or signal levels are not optimal, ambiguities or errors are likely. These are miscalled bases, extra or missing bases, or unidentified bases in the DNA sequence file, typically at about 1 to 3 errors per 100 bases. An error rate near 1% is a common target for DNA sequencing performance, since comparison with complementary strand sequence data should then reduce errors to about 1 per 10,000 base pairs. This is only possible if every mismatch of the sequence and its complement is identified and correctly reconciled. Even then, error rates from 0.01% to 0.1% approximate the variation among alleles in a gene pool: some such alleles can correlate with severe burdens of inherited pathology. Small improvements in single strand error rate will have substantial impact on quality of finished sequences from 1/10,000 bp to 1/1,000,000 bp. Improvements are needed if automated systems are to provide longer spans of DNA sequences with fewer errors. The emphasis of this proposal is on raw data acquisition and new methods for translation of the raw data to finished DNA sequences. An expert system, rule-based method will be developed to reinforce conventional translation of raw data to DNA sequences. An independent, pattern-recognition system will also be developed and tested, using techniques for construction and training of neural nets. We will also evaluate two new approaches to utilize single label, single data channels for more efficient determination of DNA sequences. Alternative approaches to oligonucleotide separation for sequence analysis will also be investigated. In pursuit of these specific aims we will take advantage of the relative separations and intensities of successive oligomers in DNA sequencing ladders, as independent determinants of DNA sequence-specific data stream patterns.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genome Study Section (GNM)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
Schools of Medicine
United States
Zip Code
Lauer, Kim P; Llorente, Isabel; Blair, Eric et al. (2004) Natural variation among human adenoviruses: genome sequence and annotation of human adenovirus serotype 1. J Gen Virol 85:2615-25
Benamira, M; Johnson, K; Chaudhary, A et al. (1995) Induction of mutations by replication of malondialdehyde-modified M13 DNA in Escherichia coli: determination of the extent of DNA modification, genetic requirements for mutagenesis, and types of mutations induced. Carcinogenesis 16:93-9
Boylan, K B; Cornblath, D R; Glass, J D et al. (1995) Autosomal dominant distal spinal muscular atrophy in four generations. Neurology 45:699-704
Soares, V M; Brzustowicz, L M; Kleyn, P W et al. (1993) Refinement of the spinal muscular atrophy locus to the interval between D5S435 and MAP1B. Genomics 15:365-71
Golden 3rd, J B; Torgersen, D; Tibbetts, C (1993) Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling. Proc Int Conf Intell Syst Mol Biol 1:136-44
Brzustowicz, L M; Kleyn, P W; Boyce, F M et al. (1992) Fine-mapping of the spinal muscular atrophy locus to a region flanked by MAP1B and D5S6. Genomics 13:991-8