The tremendous amounts of sequence data made available in recent time have increased the need to re-engineer existing bioinformatics algorithms for better performance. Our ability to organize human and mouse genomic, cDNA, and EST (expressed sequence tag) data, rapidly assemble microbial genomes, and compare sequences within and between organisms depends on programs that can operate on large amounts of data and be easily incorporated into scientific applications. In the case of the popular assembly program Phrap (P. Green, unpublished), performance improvements include the ability to perform incremental assemblies, where new sequence data are added to already assembled sequences, better memory management to accommodate larger data sets, and running the algorithm as a parallel process to reduce assembly times. Further Improvements include developing an API (Application Programming Interface) so that Phrap can be better incorporated into bioinformatics applications. In this project a prototype of Phrap will be developed that performs incremental assemblies and has improved memory management. New versions of Phrap will be structured to run as parallel processes. Finally, we will develop specifications for an API and an XML-DTD (eXtensible Markup Language - Data Type Definition) that will allow Phrap to be more efficiently incorporated into bioinformatics applications.

Proposed Commercial Applications

Phrap is widely used in industry and academia for applications involving DNA sequences. There are over 100 commercial sites that would benefit from new versions of Phrap that support incremental assemblies and utilize computer resources better. An API for Phrap will encourage application development creating additional commercialization possibilities for algorithm and application developers.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43HG002244-01
Application #
6211967
Study Section
Special Emphasis Panel (ZRG1-SSS-Y (01))
Program Officer
Felsenfeld, Adam
Project Start
2000-08-01
Project End
2001-03-31
Budget Start
2000-08-01
Budget End
2001-03-31
Support Year
1
Fiscal Year
2000
Total Cost
$98,238
Indirect Cost
Name
Geospiza, Inc.
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98107