Even as new technologies continue to drive down the cost of DNA sequencing, we are in critical need of equivalently powerful methods informing long-range contiguity to support both de novo genome assembly and haplotype-resolved genome resequencing. With funding through this program, we have explored diverse approaches for low-cost, massively parallel capture of contiguity information. Our progress is substantial, and includes the development of a method for in situ library construction and optical sequencing, a method in which we exploit 'contact probability maps'to produce the first chromosome-scale de novo mammalian genome assemblies based exclusively on short reads, and a method that combines contiguity preserving transposition and combinatorial indexing for accurate, megabase-scale haplotype-resolved human genome resequencing. We have also demonstrated the remarkable value of contiguity information through signature projects, including the first accurate, non-invasive prediction of a fetal genome, and the first haplotype-resolved sequencing of a cancer genome and epigenome. In this renewal application, we propose to narrow our focus to the advanced development of our two most promising approaches, namely contact probability mapping (Aim 1) and contiguity preserving transposition (Aim 2). We will then formally evaluate these methods for cost, performance and scalability, while also seeking to integrate them with one another and with emerging sequencing paradigms (Aim 3). Coupled with a modest drop in the per-base cost of short read DNA sequencing, these methods will enable chromosome-scale de novo assembly of large genomes as well as chromosome-scale haplotype-resolved human genome resequencing for about $1,000.
As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing to the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. The technologies developed by this project will accelerate progress towards these goals by enabling the affordable and comprehensive sequencing of individual human genomes. These same technologies will also facilitate the accurate sequencing and assembly of the genomes of other species, which inform our understanding of the human genome through comparative analysis.
|Salipante, Stephen J; Adey, Andrew; Thomas, Anju et al. (2016) Recurrent somatic loss of TNFRSF14 in classical Hodgkin lymphoma. Genes Chromosomes Cancer 55:278-87|
|Varoquaux, Nelle; Liachko, Ivan; Ay, Ferhat et al. (2015) Accurate identification of centromere locations in yeast genomes using Hi-C. Nucleic Acids Res 43:5331-9|
|Deng, Xinxian; Ma, Wenxiu; Ramani, Vijay et al. (2015) Bipartite structure of the inactive mouse X chromosome. Genome Biol 16:152|
|Phadnis, Nitin; Baker, EmilyClare P; Cooper, Jacob C et al. (2015) An essential cell cycle regulation gene causes hybrid inviability in Drosophila. Science 350:1552-5|
|Ma, Wenxiu; Ay, Ferhat; Lee, Choli et al. (2015) Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods 12:71-8|
|Laszlo, Andrew H; Derrington, Ian M; Ross, Brian C et al. (2014) Decoding long nanopore sequencing reads of natural DNA. Nat Biotechnol 32:829-33|
|Adey, Andrew; Kitzman, Jacob O; Burton, Joshua N et al. (2014) In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res 24:2041-9|
|Schwartz, Jerrod J; Roach, David J; Thomas, James H et al. (2014) Primate evolution of the recombination regulator PRDM9. Nat Commun 5:4370|
|PrÃ¼fer, Kay; Racimo, Fernando; Patterson, Nick et al. (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43-9|
|Burton, Joshua N; Liachko, Ivan; Dunham, Maitreya J et al. (2014) Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4:1339-46|
Showing the most recent 10 out of 17 publications