It is possible to combine technologies based on single molecules to achieve de novo genome sequence assembly with phasing and genome-wide structural variation identification. De novo assembled whole genome sequencing fully describes the diploid human genome except for a small number of long, highly repetitive sequences such as the centromeres, telomeres, and near-identical segmental duplications. Key to the success of phased genome sequence assembly is the single molecule mapping approach originally developed by our group and is now improved by Bionano Genomics. The method starts with sequence-specific labeling of long (180 kb to >1 Mb), double-stranded genomic DNA fragments with fluorophores followed by high- throughput, automated imaging and analysis of the linearized fluorescent DNA molecules in nanochannel arrays on a commercially available instrument. During the next phase of this project, we propose to produce phased genome sequence assemblies of 2 individuals from each of all 26 ethnic groups of the 1000 Genomes Project to serve as general references for the community. In addition, we will further develop the single molecule labeling technology to map repetitive elements that are difficult to interrogate genome-wide and to precisely phase long- range target regions. The approach we are taking to construct de novo phased and assembled genomes will produce ?near reference grade? genomes with high efficiency and at low cost for many ethnic groups around the world. These reference sequences will increase substantially the value of all the whole genome sequences already obtained and provide further insight into structural variation patterns across human populations. The technology development aims of this proposal will address some of the most difficult questions facing genome analysis today. At the end of this four-year project, a robust method for phased genome assembly, repetitive sequence mapping, and long-range phasing will be developed and ready for application in many areas of genome research.

Public Health Relevance

We have combined technologies based on single molecules to achieve de novo genome sequence assembly with phasing and genome-wide structural variation identification, which fully describe the diploid human genome. In this project, we propose to produce, efficiently and at low cost, phased genome sequence assemblies of all 26 ethnic groups from the 1000 Genomes Project to serve as general references for the community. In addition, we will further develop the single molecule labeling technology to map repetitive elements that are difficult to interrogate genome-wide, and to precisely phase long-range target regions.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG005946-07A1
Application #
9613450
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Smith, Michael
Project Start
2010-09-27
Project End
2022-11-30
Budget Start
2018-08-24
Budget End
2019-11-30
Support Year
7
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of California San Francisco
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
094878337
City
San Francisco
State
CA
Country
United States
Zip Code
94118
Wong, Karen H Y; Levy-Sakin, Michal; Kwok, Pui-Yan (2018) De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat Commun 9:3040
Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T et al. (2016) Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays. Genetics 202:351-62
Mostovoy, Yulia; Levy-Sakin, Michal; Lam, Jessica et al. (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 13:587-90
McCaffrey, Jennifer; Sibert, Justin; Zhang, Bin et al. (2016) CRISPR-CAS9 D10A nickase target-specific fluorescent labeling of double strand DNA for whole genome mapping and structural variation analysis. Nucleic Acids Res 44:e11
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun et al. (2015) Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods 12:780-6
O'Bleness, Majesta; Searles, Veronica B; Dickens, C Michael et al. (2014) Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome. BMC Genomics 15:387
Hastie, Alex R; Dong, Lingli; Smith, Alexis et al. (2013) Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome. PLoS One 8:e55864
Lam, Ernest T; Hastie, Alex; Lin, Chin et al. (2012) Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30:771-6
Baday, Murat; Cravens, Aaron; Hastie, Alex et al. (2012) Multicolor super-resolution DNA imaging for genetic analysis. Nano Lett 12:3861-6