The broad goal of the proposed work is to advance the efficiency of DNA sequencing by further developing and applying to the data from automatic electrophoretic-gel readers, estimation methods recently developed by us in the context of restriction-fragment sizing. We believe that such methods show promise for extending the readable range of sequence gels significantly, thereby improving the yield from sequencing studies employing high-resolution electrophoresis. Our general approach to the analysis of data from electrophoretic gels is to verify and capitalize on the models developed by others, where appropriate, and to develop and evluate new models as needed to describe the processes operative in generation of the observed data and apply those models to estimation methods designed to determine the phenomenon of interest most likely to have given rise to the data.
Specific aims for accomplishing the proposed work are: 1) To develop and/or import, refine, and verify models of band mobility, amplitude, and shape as a function of time and local sequence, and of baseline variation with space and time; 2) To develop and evaluate base-calling algorithms employing priors derived from the models developed in aim 1 and based on the posterior likelihood of the data under both Gaussian and Poisson models for fluorescent emission; 3) To subsequently explore the application of methods for joint stochastic and symbolic inferences in order to incorporate more complex models such as would be needed to analyze aberrancies such as compressions; 4) To develop and apply a multilevel (e.g. symbolic, profile, and scan-intensity), annotated evaluation database for quantifying performance of the algorithms with respect to error types and their frequencies; and 5) To initiate early dissemination (""""""""beta testing"""""""") of the algorithms and the evaluation database; subsequent full-scale dissemination would be under the auspices of planned continuation support of our laboratory from the Biomedical Research Technology Program of NCRR, NIH. Because of the size of the human genome, full sequence determination will depend not only on the development of low-cost-reliable technologies that can be automated but also on strategies to improve the information yield from those technologies.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genome Study Section (GNM)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Washington University
Biostatistics & Other Math Sci
Schools of Medicine
Saint Louis
United States
Zip Code
Drury, H A; Clark, K W; Hermes, R E et al. (1992) A graphical user interface for quantitative imaging and analysis of electrophoretic gels and autoradiograms. Biotechniques 12:892-8, 900-1