The growing use of DNA sequence data in research, databases, diagnostic and therapeutic biotechnology, and even litigation dramatically increases the need to improve the quality of data being used. This proposal addresses the problem of assembling a large set of sequenced DNA fragments into a finished consensus. In order for a sequencing project to produce high quality finished sequence data, the assembly of sequence fragments must be correct and accurate both in its large scale structure and in the fine scale detail of the alignment of individual base calls. We propose to investigate new algorithms for consensus estimation and assembly of DNA sequence fragments. Recent novel word- based approaches to consensus estimation offer promise as a method for de novo assembly and for exploring alternative assemblies on the large scale. This will be especially important when sequences contain large exact or approximate repeats. We propose to develop several main enhancements to these algorithms. In particular, we will develop a global optimization algorithm for determining consensus sequences, replacing current locally optimizing methods. Also, we propose to develop algorithms allowing alternative alignments in regions of ambiguity. This approach will allow us to assess alignment accuracy at both the large and fine scale level.

Proposed Commercial Applications

Accurate assemblies are at the heart of many sequencing projects central to biopharmaceutical, agricultural, and basic research as well as to the Human Genome Project. The proposed advances will provide the potential for simultaneously increasing reliability and automation in a bioinformatics software market totaling about 100 million dollars per year.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43HG001747-01
Application #
2536784
Study Section
Special Emphasis Panel (ZRG2-SSS-Y (02))
Project Start
1998-02-01
Project End
1999-07-31
Budget Start
1998-02-01
Budget End
1999-07-31
Support Year
1
Fiscal Year
1998
Total Cost
Indirect Cost
Name
Daniel H. Wagner Associates, Inc.
Department
Type
DUNS #
City
Malvern
State
PA
Country
United States
Zip Code
19341