New Word Based Methods for DNA Sequence Assembly

Karlovitz, Maximillian

Abstract

The growing use of DNA sequence data in research, databases, diagnostic and therapeutic biotechnology, and even litigation dramatically increases the need to improve the quality of data being used. This proposal addresses the problem of assembling a large set of sequenced DNA fragments into a finished consensus. In order for a sequencing project to produce high quality finished sequence data, the assembly of sequence fragments must be correct and accurate both in its large scale structure and in the fine scale detail of the alignment of individual base calls. We propose to investigate new algorithms for consensus estimation and assembly of DNA sequence fragments. Recent novel word- based approaches to consensus estimation offer promise as a method for de novo assembly and for exploring alternative assemblies on the large scale. This will be especially important when sequences contain large exact or approximate repeats. We propose to develop several main enhancements to these algorithms. In particular, we will develop a global optimization algorithm for determining consensus sequences, replacing current locally optimizing methods. Also, we propose to develop algorithms allowing alternative alignments in regions of ambiguity. This approach will allow us to assess alignment accuracy at both the large and fine scale level.

Proposed Commercial Applications

Accurate assemblies are at the heart of many sequencing projects central to biopharmaceutical, agricultural, and basic research as well as to the Human Genome Project. The proposed advances will provide the potential for simultaneously increasing reliability and automation in a bioinformatics software market totaling about 100 million dollars per year.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #: 1R43HG001747-01
Application #: 2536784
Study Section: Special Emphasis Panel (ZRG2-SSS-Y (02))

Project Start: 1998-02-01
Project End: 1999-07-31
Budget Start: 1998-02-01
Budget End: 1999-07-31
Support Year: 1
Fiscal Year: 1998
Total Cost
Indirect Cost

New Word Based Methods for DNA Sequence Assembly
Karlovitz, Maximillian A.
Daniel H. Wagner Associates, Inc., Malvern, PA, United States

Abstract

Proposed Commercial Applications

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Proposed Commercial Applications

Funding Agency

Institution

Comments