DNA sequencing is currently in the midst of disruptive technological shifts, with 454, Illumina, and Solid providing us with enormous throughput increases and large reductions in cost per base. Massively parallel technologies deliver a few Gbp of sequence per week as short fragments, or reads. New applications of sequencing only recently considered impractical are enabled: personal genome sequencing, """"""""metagenomics"""""""" analysis of 'soups'containing several, to hundreds of unique organisms, and finally, de novo sequencing of novel genomes of complex organisms. No matter how the sequencing is done, reads must be assembled computationally, if they are to be useful. Given the read length and read quality limitations of new instruments and the massive volume of data generated, the computational assembly problem is becoming critical, with the cost of computational infrastructure and personnel exceeding reagent and instrument-related costs. Moreover, the results of assembly are currently far from ideal;for example, much of the human genome remains invisible due to high percentage of repeats. We propose to develop a new """"""""front end"""""""" to next-gen sequencers for DNA preparation, the """"""""Read-Cloud Method"""""""", which can reduce computational cost of genome assembly by 2-3 orders of magnitude, produce more complete and accurate genomes, and make metagenomics tractable. We propose a hierarchical sequencing approach, without any need for bacterial cloning. We will achieve this by handling single DNA molecules, tiled across the genome with high redundancy, on microfluidic devices. We will design, prototype, and thoroughly test technology to (i) shear genomic DNA into 200- kbp fragments with narrow size distributions;(ii) randomly amplify each individual, 200-kbp DNA in isolation, within a porous gel microcontainer that will be formed around the dsDNA molecule within a microdevice;(iii) digest micro-encapsulated DNA into small fragments, of tunable size;(iv) bar-code the progeny of each 200-kbp DNA with a 12mer oligonucleotide, to identify each read as associated with a particular 200-kbp DNA. A planar microfluidic device will be fabricated to allow one unique bar- code sequence to be blunt-end-ligated to both DNA termini. Bar-coded DNA is pooled, and next-gen sequencing is done. The results are a highly reducible data set. The method and algorithm are applicable universally, to next-generation platforms. The PIs (Batzoglou, Barron, Shaqfeh, Quake) will collaborate to make an efficient approach to hierarchical sequencing in microfluidic devices.
Project Narrative Gene sequencing is important to medicine. Our DNA sequencing method has the potential for reducing computational cost by orders of magnitude while making the assembled genomes significantly more complete and accurate. The key to this step is using microfluidic handling technologies to subdivide genomic DNA into 200kbp fragments, which are then amplified in isolation from each other and uniquely-labeled to form a highly reducible dataset for genomic assembly.
Showing the most recent 10 out of 11 publications