Next-generation sequencing (NGS) platforms are fundamentally altering genetic and genomic research by providing massive amounts of data in a low-cost, high-throughput format. The main drawback of existing technologies is the short sequence read lengths they produce. Existing library prep methods are also constrained in producing short inserts of only a few kb. As a result, de novo assembly of genomes is not practical with short read NGS technologies alone. Even with a high quality reference human genome, resequencing and assembly of new human genomes is a significant challenge when analyzing complex genomic regions. Haplotyping across more than a few kb is not achieved without resorting to cloned DNA. New tools that bridge the gap between massively parallel short read sequencing technologies (<1,500 bases) and the need for large scaffolds >20 kb to assemble a genome are clearly needed. The SBIR Phase I grant proposal "New Tools for Structural Variation Analysis, De Novo Assembly and Closing of Complex Genomes" proposes to develop a new "front end" to NGS and the software to support it. The technology to construct clone-free 20-40 kb mate pair libraries from large randomly sheared DNA fragments does not yet exist. This technology will enable the accurate assembly of complex genomes, much like fosmid and BAC end sequences in conventional clone based strategies. The development of these tools could reduce manual closing costs and computational costs of genome assembly by orders of magnitude produce more complete and accurate genomes, enable the de novo sequencing of daunting genomes, and make personal genome resequencing and metagenomics tractable.
The practical result of this work will be the development of new tools to accurately and affordably assemble human and microbial genomes associated with disease. DNA sequencing of individual human genomes can unlock the genetic basis of complex diseases and as such is important to our medical well-being. The complete and accurate genomic analysis of hundreds of pathogenic organisms can reveal clues regarding their lifestyle and weaknesses. True de novo sequencing of novel genomes of complex organisms can shed light on comparative genomics, the evolutionary history of life, and better understanding of all life styles on earth, which forms a web that humans need for survival.