Two of the major challenges in genome analysis are de novo genome sequence assembly based on """"""""short read"""""""" shotgun sequencing and structural variation analysis. At present, most medical sequencing projects and whole genome sequencing projects map the sequencing data onto the reference human genome sequence without performing whole genome assemblies. When whole genome assembly is attempted, it is done by generating paired-end sequencing reads from a number of sequencing libraries with different insert sizes. The paired-end sequences provide the """"""""scaffold"""""""" that helps with sequence assembly. However, it increases the complexity of the sequencing project and provides limited information on the haplotypes of the diploid human genome. Similarly, current structural variation scanning based on array-based comparative genomic hybridization is unable to determine the genomic locations of duplicated regions or identify genomic inversions or balanced translocations. We propose to optimize a new, highly flexible, automated method for optical mapping for general use. Our genome mapping strategy starts with sequence specific nicking of double-stranded genomic DNA followed by displacing a short strand of DNA downstream of the nicking site with DNA polymerase. These nicked-flap structures can be labeled with fluorescent dNTPs by a primer extension reaction or with fluorescent probes designed to complement specific sequences found on the single-stranded DNA flaps. The large (100 kbp to 300 kbp) labeled DNA fragments are then linearized in nano-channels for high-throughput, automated imaging and analysis on a system assembled with off-the-shelf equipment. By intelligent probe design, one can therefore create genome maps tailored to the questions being asked, be it local structural variation screening, global structural variation detection, or scaffolding for de novo genome sequence assembly.
As sequencing platforms are producing short-read sequences at extremely high rates, the main obstacle to whole genome sequencing is the inability to assemble the sequencing data accurately and efficiently. Furthermore, structural variations in the human genome are found to be associated with a number of important diseases but genome-wide scanning for these variations is not yet feasible. In this proposal, we aim to develop and optimize a single molecule mapping approach that will make de novo sequence assembly and structural variation analysis possible.
|Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T et al. (2016) Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays. Genetics 202:351-62|
|Mostovoy, Yulia; Levy-Sakin, Michal; Lam, Jessica et al. (2016) A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 13:587-90|
|McCaffrey, Jennifer; Sibert, Justin; Zhang, Bin et al. (2016) CRISPR-CAS9 D10A nickase target-specific fluorescent labeling of double strand DNA for whole genome mapping and structural variation analysis. Nucleic Acids Res 44:e11|
|Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun et al. (2015) Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods 12:780-6|
|O'Bleness, Majesta; Searles, Veronica B; Dickens, C Michael et al. (2014) Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome. BMC Genomics 15:387|
|Hastie, Alex R; Dong, Lingli; Smith, Alexis et al. (2013) Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome. PLoS One 8:e55864|
|Lam, Ernest T; Hastie, Alex; Lin, Chin et al. (2012) Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30:771-6|
|Baday, Murat; Cravens, Aaron; Hastie, Alex et al. (2012) Multicolor super-resolution DNA imaging for genetic analysis. Nano Lett 12:3861-6|