The primary objective of this research is to improve automated analysis of gel-based DNA sequencing ladders, through pattern recognition-based translation of raw instrument data to DNA sequences. We emphasize neural networks, adapted to particular sequencing conditions and instruments. The performances of pattern recognition and conventional basecalling software will be evaluated: (1) as experimental errors challenge description of the natural allelic diversity of human adenoviral genomes; (2) for detection and specification of heterozygous loci in diploid template experiments; (3) for primer selection and assembly operations of large scale sequencing projects. Distributions of basecalling errors will be analyzed in the contexts of neighboring nucleotide identities and as results of different sequencing strategies. Three principal advantages are expected from pattern recognition basecalling software: (1) analysis of contextual arrays of oligomer traces improves basecalling accuracy; (2) specifically tasked, neural network and algorithmic processors support on-line signal conditioning and basecalling in real time; and (3) the signal conditioning and pattern recognition modules support objective measures of confidence for each basecall. This project will significantly and positively impact progress towards the stated goals of the human genome initiative. No incremental costs for hardware or strategic modifications are required. Cost savings can be realized through automation of labor intensive review and editing of primary data. Real-time basecalling supports higher throughput instruments, exploiting faster separation of larger parallel arrays of sequencing ladders. Objective basecall confidence parameters support overlap assignment during sequence assembly, and should facilitate sequence - match searches through expanding databases.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG000562-04
Application #
2208900
Study Section
Genome Study Section (GNM)
Project Start
1992-02-01
Project End
1997-01-31
Budget Start
1995-02-01
Budget End
1996-01-31
Support Year
4
Fiscal Year
1995
Total Cost
Indirect Cost
Name
Vanderbilt University Medical Center
Department
Microbiology/Immun/Virology
Type
Schools of Medicine
DUNS #
004413456
City
Nashville
State
TN
Country
United States
Zip Code
37212
Lauer, Kim P; Llorente, Isabel; Blair, Eric et al. (2004) Natural variation among human adenoviruses: genome sequence and annotation of human adenovirus serotype 1. J Gen Virol 85:2615-25
Benamira, M; Johnson, K; Chaudhary, A et al. (1995) Induction of mutations by replication of malondialdehyde-modified M13 DNA in Escherichia coli: determination of the extent of DNA modification, genetic requirements for mutagenesis, and types of mutations induced. Carcinogenesis 16:93-9
Boylan, K B; Cornblath, D R; Glass, J D et al. (1995) Autosomal dominant distal spinal muscular atrophy in four generations. Neurology 45:699-704
Soares, V M; Brzustowicz, L M; Kleyn, P W et al. (1993) Refinement of the spinal muscular atrophy locus to the interval between D5S435 and MAP1B. Genomics 15:365-71
Golden 3rd, J B; Torgersen, D; Tibbetts, C (1993) Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling. Proc Int Conf Intell Syst Mol Biol 1:136-44
Brzustowicz, L M; Kleyn, P W; Boyce, F M et al. (1992) Fine-mapping of the spinal muscular atrophy locus to a region flanked by MAP1B and D5S6. Genomics 13:991-8