Massively parallel technologies have reduced the per-base cost of DNA sequencing by several orders of magnitude. However, limited read lengths and a lack of methods to establish contiguity over even modest distances have prevented these technologies from achieving the high-quality, low-cost de novo assembly of mammalian genomes. Even as revolutionary sequencing technologies further mature, it may continue to be the case that the best technologies in terms of cost-per-base yield reads that are of an insufficient length or quality for the effective de novo assembly of large genomes. To address this critical need, we are exploiting high density, random, in vitro transposition as a novel means of physically shattering genomic DNA in creative ways that facilitate the recovery of contiguity information at different scales. Our project is divided into four aims, the first three of which are respectively directed at the development of massively parallel methods for determining short-range, mid-range, and long-range contiguity. These are: 1) a method for shattering genomic DNA with symmetric tags that post hoc inform the ordering of adjacent fragmentation events in a way that is entirely independent of the primary sequence content;2) a method for massively parallel, in vitro barcoding of fosmid or BAC-sized subsequences of a genome, thereby facilitating hierarchical assembly;3) an in situ method for converting stretched DNA molecules into adaptor-flanked libraries, such that reads generated by massively parallel sequencing will remain linearly ordered in terms of the XY coordinates at which they originate. In the fourth aim, we will integrate these methods to demonstrate: 1) the highly cost-effective de novo assembly of the mouse genome with a quality that exceeds that of the original assembly;2) the highly cost-effective haplotype resolved resequencing of a human genome.
As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing to the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. The technologies developed by this project will accelerate progress towards these goals by enabling the affordable sequencing of haplotype-resolved human genomes. These same technologies will also facilitate the high-quality, cost-effective assembly of the genomes of other mammalian species, which inform our understanding of the human genome through evolutionary analysis.
|Salipante, Stephen J; Adey, Andrew; Thomas, Anju et al. (2016) Recurrent somatic loss of TNFRSF14 in classical Hodgkin lymphoma. Genes Chromosomes Cancer 55:278-87|
|Varoquaux, Nelle; Liachko, Ivan; Ay, Ferhat et al. (2015) Accurate identification of centromere locations in yeast genomes using Hi-C. Nucleic Acids Res 43:5331-9|
|Deng, Xinxian; Ma, Wenxiu; Ramani, Vijay et al. (2015) Bipartite structure of the inactive mouse X chromosome. Genome Biol 16:152|
|Phadnis, Nitin; Baker, EmilyClare P; Cooper, Jacob C et al. (2015) An essential cell cycle regulation gene causes hybrid inviability in Drosophila. Science 350:1552-5|
|Ma, Wenxiu; Ay, Ferhat; Lee, Choli et al. (2015) Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods 12:71-8|
|Laszlo, Andrew H; Derrington, Ian M; Ross, Brian C et al. (2014) Decoding long nanopore sequencing reads of natural DNA. Nat Biotechnol 32:829-33|
|Adey, Andrew; Kitzman, Jacob O; Burton, Joshua N et al. (2014) In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res 24:2041-9|
|Schwartz, Jerrod J; Roach, David J; Thomas, James H et al. (2014) Primate evolution of the recombination regulator PRDM9. Nat Commun 5:4370|
|PrÃ¼fer, Kay; Racimo, Fernando; Patterson, Nick et al. (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43-9|
|Burton, Joshua N; Liachko, Ivan; Dunham, Maitreya J et al. (2014) Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4:1339-46|
Showing the most recent 10 out of 17 publications