Even as new technologies continue to drive down the cost of DNA sequencing, we are in critical need of equivalently powerful methods informing long-range contiguity to support both de novo genome assembly and haplotype-resolved genome resequencing. With funding through this program, we have explored diverse approaches for low-cost, massively parallel capture of contiguity information. Our progress is substantial, and includes the development of a method for in situ library construction and optical sequencing, a method in which we exploit 'contact probability maps'to produce the first chromosome-scale de novo mammalian genome assemblies based exclusively on short reads, and a method that combines contiguity preserving transposition and combinatorial indexing for accurate, megabase-scale haplotype-resolved human genome resequencing. We have also demonstrated the remarkable value of contiguity information through signature projects, including the first accurate, non-invasive prediction of a fetal genome, and the first haplotype-resolved sequencing of a cancer genome and epigenome. In this renewal application, we propose to narrow our focus to the advanced development of our two most promising approaches, namely contact probability mapping (Aim 1) and contiguity preserving transposition (Aim 2). We will then formally evaluate these methods for cost, performance and scalability, while also seeking to integrate them with one another and with emerging sequencing paradigms (Aim 3). Coupled with a modest drop in the per-base cost of short read DNA sequencing, these methods will enable chromosome-scale de novo assembly of large genomes as well as chromosome-scale haplotype-resolved human genome resequencing for about $1,000.
As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing to the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. The technologies developed by this project will accelerate progress towards these goals by enabling the affordable and comprehensive sequencing of individual human genomes. These same technologies will also facilitate the accurate sequencing and assembly of the genomes of other species, which inform our understanding of the human genome through comparative analysis.
|Prufer, Kay; Racimo, Fernando; Patterson, Nick et al. (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43-9|
|Laszlo, Andrew H; Derrington, Ian M; Ross, Brian C et al. (2014) Decoding long nanopore sequencing reads of natural DNA. Nat Biotechnol 32:829-33|
|Burton, Joshua N; Liachko, Ivan; Dunham, Maitreya J et al. (2014) Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4:1339-46|
|Schwartz, Jerrod J; Roach, David J; Thomas, James H et al. (2014) Primate evolution of the recombination regulator PRDM9. Nat Commun 5:4370|
|Adey, Andrew; Burton, Joshua N; Kitzman, Jacob O et al. (2013) The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500:207-11|