The objective of this Phase I SBIR proposal is to test the feasibility of a new paradigm for storage and analysis of complete genome sequences. Inexpensive and rapid DNA sequencing technologies are anticipated in the near future. With this development, the promise of medical decision-making based on individual genetic risk will be realized. The emergence of cost effective DNA sequencing technologies will require information systems capable of efficient storage and manipulation of many thousands of fully-sequenced individual genomes. Human genomes are large, occupying upwards of 6x10/9 bytes each with naive computer storage approaches, whereas the difference between any two genomes is only around 0.1%, or 6x10/6 bytes. The key innovation in this proposal is applying Delta Compression to DNA sequence strings allowing an entire genome to be represented by a set of differences (delta) between a reference genome (R) and a version genome (V). A database representation of delta is proposed that will achieve 60-120 fold compression over a text representation of DNA. In addition to the huge savings in storage space, this storage paradigm offers pre-computed polymorphism data. A description of the method for determining delta, a data model to represent delta, methods for comparing genomes based on delta, and a method for traversing R to reconstruct V based on delta are part of the innovation. There is currently no information system that seeks to solve the problem of storing complete human genomes on this scale. Delta Compression has been widely applied to other string data to reduce network traffic and simplify data backups. The proposed application of Delta Compression in molecular biology is novel. The resulting technology and content-driven products built around this information system will have wide commercial appeal in all disciplines where the study of individual DNA variation is important. ? ?