Next-generation sequencing (NGS) platforms are fundamentally altering genetic and genomic research by providing massive amounts of data in a low-cost, high-throughput format. The main drawback of existing technologies is the short sequence read lengths they produce. As a result, de novo assembly of daunting genomes is still impossible and resequencing and assembly of human genomes is a significant challenge when analyzing complex genomic regions. New tools that bridge the gap between massively parallel short read sequencing technologies (35-500 bases) and the need for large scaffolds to assemble a genome (100,000 bases) are clearly needed. The SBIR Phase I grant proposal """"""""New Strategies for De Novo Sequencing of Daunting Genomes"""""""" proposes to develop a new """"""""front end"""""""" to NGS. The technology to construct paired-end clone-free libraries from large randomly sheared DNA fragments (50-300 kb) has not been developed. A high efficiency universal protocol for making clone-free libraries will generate long physical scaffolds from the paired-ends of 50, 100 and 300 Kb inserts, enabling the accurate assembly of complex genomes, much like fosmid and BAC end sequences in conventional clone based strategies. A new """"""""virtual BAC"""""""" library construction technology will replace the conventional clone based method. A clone-free 300 Kb insert library will be constructed and individual members will be completely sequenced using the new tools developed for the first time in this proposal. The production of numerous contiguous 300 Kb regions of sequence from a chromosome will dramatically simplify the accurate assembly of complex genomic regions as well as complex genomes, much like the sequencing of entire BACs clone in conventional strategies. The development of these tools could reduce computational cost of genome assembly by 2-3 orders of magnitude, produce more complete and accurate genomes, enable the de novo sequencing of daunting genomes, and make personal genome resequencing and metagenomics tractable. 1
The practical result of this work will be the accurate assembly of complex regions of the human genome associated with disease, as well the ability to assemble entire genomes using random sequencing strategies. DNA sequencing of individual human genomes can unlock the genetic basis of complex diseases and as such is important to our medical well being. Metagenomic analysis of hundreds of unique organisms that cannot be cultivated can unlock new metabolic pathways for small molecule drugs and other industry applications. True de novo sequencing of novel genomes of complex organisms can shed light on comparative genomics, the evolutionary history of life, and better understanding of all life styles on earth, which forms a web that humans need for survival.