The human genome project spurred the development of high throughput technologies, especially in the area of DNA sequencing. Not only has this effort produced a draft of the human genome, it's catalyzed development of an entire industry based on DNA sequencing and genomics. Since these technologies produce enormous amounts of data they depend on bioinformatics programs for data management. Phrap, Cross_Match, RepeatMasker and Consed are four programs that played an integral role in the human genome project and became accepted as standard. However, as the technology for sequencing has evolved, so too, have the applications. These new applications include sequencing additional genomes, EST cluster analysis, and genotyping and they have highlighted the need to update standard bioinformatics programs to meet the current needs of a broader community. In this project we will re-engineer Phrap, Cross_Match and Repeat Masker to improve performance by optimizing these algorithms and developing a hierarchical data file to store and manipulate assembled sequence data. Phrap and Cross_Match will also be modified to use XML-formatted data allowing users to apply constraints to sequence assembly. Lastly, we will develop a new program to review, edit, and manipulate sequences, thus giving users unprecedented control over their data.

Proposed Commercial Applications

Phrap is widely used in industry and academia for applications involving DNA sequences. There are over 100 commercial sites that would benefit from new versions of Phrap that support incremental assemblies and utilize computer resources better. An API for Phrap will encourage application development creating additional commercialization possibilities for algorithm and application developers.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
3R44HG002244-03S1
Application #
6912979
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Felsenfeld, Adam
Project Start
2000-08-01
Project End
2005-02-28
Budget Start
2004-07-05
Budget End
2005-02-28
Support Year
3
Fiscal Year
2004
Total Cost
$191,986
Indirect Cost
Name
Geospiza, Inc.
Department
Type
DUNS #
117537170
City
Seattle
State
WA
Country
United States
Zip Code
98107