Massively parallel technologies have reduced the per-base cost of DNA sequencing by several orders of magnitude. However, limited read lengths and a lack of methods to establish contiguity over even modest distances have prevented these technologies from achieving the high-quality, low-cost de novo assembly of mammalian genomes. Even as revolutionary sequencing technologies further mature, it may continue to be the case that the best technologies in terms of cost-per-base yield reads that are of an insufficient length or quality for the effective de novo assembly of large genomes. To address this critical need, we are exploiting high density, random, in vitro transposition as a novel means of physically shattering genomic DNA in creative ways that facilitate the recovery of contiguity information at different scales. Our project is divided into four aims, the first three of which are respectively directed at the development of massively parallel methods for determining short-range, mid-range, and long-range contiguity. These are: 1) a method for shattering genomic DNA with symmetric tags that post hoc inform the ordering of adjacent fragmentation events in a way that is entirely independent of the primary sequence content;2) a method for massively parallel, in vitro barcoding of fosmid or BAC-sized subsequences of a genome, thereby facilitating hierarchical assembly;3) an in situ method for converting stretched DNA molecules into adaptor-flanked libraries, such that reads generated by massively parallel sequencing will remain linearly ordered in terms of the XY coordinates at which they originate. In the fourth aim, we will integrate these methods to demonstrate: 1) the highly cost-effective de novo assembly of the mouse genome with a quality that exceeds that of the original assembly;2) the highly cost-effective haplotype resolved resequencing of a human genome.
As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing to the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. The technologies developed by this project will accelerate progress towards these goals by enabling the affordable sequencing of haplotype-resolved human genomes. These same technologies will also facilitate the high-quality, cost-effective assembly of the genomes of other mammalian species, which inform our understanding of the human genome through evolutionary analysis.
|Prufer, Kay; Racimo, Fernando; Patterson, Nick et al. (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43-9|
|Laszlo, Andrew H; Derrington, Ian M; Ross, Brian C et al. (2014) Decoding long nanopore sequencing reads of natural DNA. Nat Biotechnol 32:829-33|
|Burton, Joshua N; Liachko, Ivan; Dunham, Maitreya J et al. (2014) Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4:1339-46|
|Schwartz, Jerrod J; Roach, David J; Thomas, James H et al. (2014) Primate evolution of the recombination regulator PRDM9. Nat Commun 5:4370|
|Adey, Andrew; Burton, Joshua N; Kitzman, Jacob O et al. (2013) The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500:207-11|