Two of the major challenges in genome analysis are de novo genome sequence assembly based on """"""""short read"""""""" shotgun sequencing and structural variation analysis. At present, most medical sequencing projects and whole genome sequencing projects map the sequencing data onto the reference human genome sequence without performing whole genome assemblies. When whole genome assembly is attempted, it is done by generating paired-end sequencing reads from a number of sequencing libraries with different insert sizes. The paired-end sequences provide the """"""""scaffold"""""""" that helps with sequence assembly. However, it increases the complexity of the sequencing project and provides limited information on the haplotypes of the diploid human genome. Similarly, current structural variation scanning based on array-based comparative genomic hybridization is unable to determine the genomic locations of duplicated regions or identify genomic inversions or balanced translocations. We propose to optimize a new, highly flexible, automated method for optical mapping for general use. Our genome mapping strategy starts with sequence specific nicking of double-stranded genomic DNA followed by displacing a short strand of DNA downstream of the nicking site with DNA polymerase. These nicked-flap structures can be labeled with fluorescent dNTPs by a primer extension reaction or with fluorescent probes designed to complement specific sequences found on the single-stranded DNA flaps. The large (100 kbp to 300 kbp) labeled DNA fragments are then linearized in nano-channels for high-throughput, automated imaging and analysis on a system assembled with off-the-shelf equipment. By intelligent probe design, one can therefore create genome maps tailored to the questions being asked, be it local structural variation screening, global structural variation detection, or scaffolding for de novo genome sequence assembly.
As sequencing platforms are producing short-read sequences at extremely high rates, the main obstacle to whole genome sequencing is the inability to assemble the sequencing data accurately and efficiently. Furthermore, structural variations in the human genome are found to be associated with a number of important diseases but genome-wide scanning for these variations is not yet feasible. In this proposal, we aim to develop and optimize a single molecule mapping approach that will make de novo sequence assembly and structural variation analysis possible.