It is possible to combine technologies based on single molecules to achieve de novo genome sequence assembly with phasing and genome-wide structural variation identification. De novo assembled whole genome sequencing fully describes the diploid human genome except for a small number of long, highly repetitive sequences such as the centromeres, telomeres, and near-identical segmental duplications. Key to the success of phased genome sequence assembly is the single molecule mapping approach originally developed by our group and is now improved by Bionano Genomics. The method starts with sequence-specific labeling of long (180 kb to >1 Mb), double-stranded genomic DNA fragments with fluorophores followed by high- throughput, automated imaging and analysis of the linearized fluorescent DNA molecules in nanochannel arrays on a commercially available instrument. During the next phase of this project, we propose to produce phased genome sequence assemblies of 2 individuals from each of all 26 ethnic groups of the 1000 Genomes Project to serve as general references for the community. In addition, we will further develop the single molecule labeling technology to map repetitive elements that are difficult to interrogate genome-wide and to precisely phase long- range target regions. The approach we are taking to construct de novo phased and assembled genomes will produce ?near reference grade? genomes with high efficiency and at low cost for many ethnic groups around the world. These reference sequences will increase substantially the value of all the whole genome sequences already obtained and provide further insight into structural variation patterns across human populations. The technology development aims of this proposal will address some of the most difficult questions facing genome analysis today. At the end of this four-year project, a robust method for phased genome assembly, repetitive sequence mapping, and long-range phasing will be developed and ready for application in many areas of genome research.
We have combined technologies based on single molecules to achieve de novo genome sequence assembly with phasing and genome-wide structural variation identification, which fully describe the diploid human genome. In this project, we propose to produce, efficiently and at low cost, phased genome sequence assemblies of all 26 ethnic groups from the 1000 Genomes Project to serve as general references for the community. In addition, we will further develop the single molecule labeling technology to map repetitive elements that are difficult to interrogate genome-wide, and to precisely phase long-range target regions.