Among the stated goals of the Human Genome Project are dramatic improvements in DNA sequencing technologies and corresponding reductions in the cost per finished base. As these goals are realized, genome sequencing will likely become a more automated, high-volume activity, and similar economies will be demanded in the process of assuring and documenting the quality of the data produced. In an environment where a thorough, expert manual validation of new sequence data may often be prohibitive, it would be a great benefit to consumers of sequence data if the quality of base calls were provided in databases with the calls themselves. For this to be practical, uncertainty information must be generated in an automatic and unobstructive manner. In the proposed research, (a) algorithms for the estimation of base probability distributions from sequencing gel lane traces will be implemented and evaluated, (b) alternative schemes for the compact storage of this information in databases will be explored, and (c) contig assembly software will be prototyped that utilizes such information for the input fragments and estimates a statistically consistent representation of the finished contig. Its success will promote improvements in the robustness and reliability of sequence data while reducing its cost through longer fragment reads and greater validation efficiency.

Proposed Commercial Applications

The results of the research will be used to extend the X/Gene(TM) sequence analysis software, a comprehensive package supporting distributed processing on Unix networks that has been under development for three years and is currently in pre-release testing. Thus enhanced, it will include facilities for automatically estimating, storing, disseminating, and robustly utilizing uncertainty information in a broad range of sequence analysis applications.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43HG001185-01
Application #
2209491
Study Section
Special Emphasis Panel (ZRG7-SSS-2 (09))
Project Start
1995-02-01
Project End
1995-07-31
Budget Start
1995-02-01
Budget End
1995-07-31
Support Year
1
Fiscal Year
1995
Total Cost
Indirect Cost
Name
Computational Biosciences, Inc.
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48106