The NIH and DOE have established an error rate of not more than 1 in 10,000 as the standard for the Human Genome project. Sequencing centers are striving to develop techniques and protocols that will meet these standards at minimal cost. The process will be greatly aided by algorithms that accurately estimate the probability that a consensus DNA sequence. base call is correct. We propose to develop and implement such algorithms. The algorithms will derive consensus confidence levels from the internal evidence that supports a given base call, that is, from the primary sequences that were assembled to form the call and from the local trace data that yielded these sequences. We will develop our confidence estimation algorithms by the techniques of statistical pattern recognition. Our approach will depend essentially on analysis of historical sequence data. In addition to developing confidence estimation algorithms, we statistically measure the accuracy of the confidence estimates Accurate consensus confidence estimates will have considerable value beyond supporting the accuracy goals of the Human Genome Project. Consensus confidence estimates will aim automation of the finishing stages of a sequencing project. Consensus confidence estimates can be reported with base calls to sequencing data bases.

Proposed Commercial Applications

Algorithms for consensus confidence estimation could be implemented within software packages used by sequencing centers, for example, in Sequencher or the Staden package. Consensus confidence estimates will be valuable to all sectors or the sequencing community, including the pharmaceutical industry and forensics.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43HG001848-01
Application #
2717825
Study Section
Special Emphasis Panel (ZRG2-GNM (02))
Program Officer
Mohla, Suresh
Project Start
1998-07-15
Project End
1999-01-14
Budget Start
1998-07-15
Budget End
1999-01-14
Support Year
1
Fiscal Year
1998
Total Cost
Indirect Cost
Name
Daniel H. Wagner Associates, Inc.
Department
Type
DUNS #
City
Malvern
State
PA
Country
United States
Zip Code
19341