The objective of this Phase I SBIR proposal is to test the feasibility of a new paradigm for storage and analysis of complete genome sequences. Inexpensive and rapid DNA sequencing technologies are anticipated in the near future. With this development, the promise of medical decision-making based on individual genetic risk will be realized. The emergence of cost effective DNA sequencing technologies will require information systems capable of efficient storage and manipulation of many thousands of fully-sequenced individual genomes. Human genomes are large, occupying upwards of 6x10/9 bytes each with naive computer storage approaches, whereas the difference between any two genomes is only around 0.1%, or 6x10/6 bytes. The key innovation in this proposal is applying Delta Compression to DNA sequence strings allowing an entire genome to be represented by a set of differences (delta) between a reference genome (R) and a version genome (V). A database representation of delta is proposed that will achieve 60-120 fold compression over a text representation of DNA. In addition to the huge savings in storage space, this storage paradigm offers pre-computed polymorphism data. A description of the method for determining delta, a data model to represent delta, methods for comparing genomes based on delta, and a method for traversing R to reconstruct V based on delta are part of the innovation. There is currently no information system that seeks to solve the problem of storing complete human genomes on this scale. Delta Compression has been widely applied to other string data to reduce network traffic and simplify data backups. The proposed application of Delta Compression in molecular biology is novel. The resulting technology and content-driven products built around this information system will have wide commercial appeal in all disciplines where the study of individual DNA variation is important. ? ?

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
1R44HG003295-01
Application #
6787872
Study Section
Special Emphasis Panel (ZRG1-SSS-H (90))
Program Officer
Bonazzi, Vivien
Project Start
2004-06-04
Project End
2004-10-30
Budget Start
2004-06-04
Budget End
2004-10-30
Support Year
1
Fiscal Year
2004
Total Cost
$149,519
Indirect Cost
Name
Seirad, Inc.
Department
Type
DUNS #
109181086
City
Santa Fe
State
NM
Country
United States
Zip Code
87507