Genetic disorders that arise from genomic rearrangements such as microdeletions and microduplications that result in dosage imbalance of one or more genes and are commonly referred to as genomic disorders. A significant proportion of the rearrangements associated with genomic disorders result from aberrant recombination mediated by low copy repeats or segmental duplications (SDs) in the human genome. SDs are a class of repetitive DNA elements that range from a few kilobasepairs (kb) to 500 kb in size and make up ~5% of the human genome. Because of their large size and the fact that paralogous copies often share 95-99% sequence identity with each other, SDs are excellent substrates for non-allelic homologous recombination (NAHR) that leads to genomic rearrangements. Their large size and near sequence identity also render SD- containing regions of the genome difficult to reliably map and sequence, making it extremely challenging to pinpoint rearrangement breakpoints within the highly homologous SDs. Furthermore, SD-containing regions are especially recalcitrant to next-generation sequencing as short-read sequences are difficult to map unambiguously within specific paralogs. Thus, the level of polymorphism and variability within SD-containing regions in the general population remain largely unknown, and it is difficult to test if variability in the structure and organization of SDs plays a role in influencing susceptibility to NAHR. Accurate breakpoint mapping will lead to better understanding of SD-mediated, recurrent rearrangements and help identify potential hotspots or sequences that influence susceptibility to NAHR. We hypothesize that SD-containing regions of the genome are highly variable, in both structure and content. We further hypothesize that this variability results in configurations of SDs that are highly unstable and predisposed to NAHR. We propose to test these hypotheses by performing ?next-generation mapping? (NGM) to map and characterize SD-containing regions of the genome. NGM is done on the Irys platform from BioNano Genomics, a novel technology that allows for visualization of long DNA molecules (up to >500 kb) in nanochannel arrays. Where appropriate, we will combine NGM with next-generation sequencing to characterize these complex regions in patients with known genomic disorders. In this project, we will analyze SD-containing regions associated with genomic disorders in normal individuals around the world, map the deletion breakpoints in patients with genomic disorders due to SD-mediated rearrangements, and analyze parents of the patients with genomic disorders to look for underlying unstable configurations within the SDs that may predispose them to NAHR-mediated rearrangements. These experiments will help us better understand the structure and organization of genomic regions involved in disease-associated recurrent rearrangements.

Public Health Relevance

Genomic rearrangements that lead to structural and copy number changes are a common source of disease- associated mutations in the human genome. Some regions of the genome are more predisposed to these types of rearrangements due to the presence and complex organization of unstable sequences. By using state- of-the-art genomic technologies to map and sequence these complex regions of the genome, we will unravel the mechanisms underlying recurrent genomic rearrangements associated with a significant number of human diseases.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Schools of Medicine
San Francisco
United States
Zip Code
Shaikh, Tamim H (2017) Copy Number Variation Disorders. Curr Genet Med Rep 5:183-190