Reconstructing the complete mutational spectrum of DNA structural variation (SV) in healthy and disease populations is a central challenge of computational biology in this new era of translational genomics. High- resolution genomics technologies have catalyzed the ongoing shift away from microarray-based methods to detect copy number variants (CNV) and toward whole-genome sequencing (WGS) in disease studies. However, there are major methodological barriers that have precluded a comprehensive benchmarking of the sensitivity and specificity of WGS-based algorithms for the detection of CNVs, as well as delineation of balanced and complex SVs such as the recently discovered dupINVdup and chromothripsis that have been cryptic to array technology. Given the high translational potential of defining genome structural changes, and to characterizing its functional consequences, there is at present a unique opportunity in genomics to develop cutting-edge computational methods to enable detection of the full mutational spectrum of SV by integrating information from new and emerging technologies. Here, I propose to develop an SV detection tool to replace the current mainstays of genetic research studies, with a focus on autism spectrum disorder (ASD) and neuropsychiatric disorders more broadly (NPDs). I will also seek to train in three primary areas to augment my existing expertise in molecular cytogenetic methods: (1) computational genomics and analysis of WGS technologies, (2) defining multi-allelic CNVs and complex SVs in ASD, (3) functional genomics to characterize the transcriptional impact of complex SVs and chromothripsis. Each goal is incorporated into specific aims.
In Aim 1, I will develop and optimize algorithms for SV detection by integrating data from multiple technologies in three trios, including long-read single-molecule sequencing, PCR-free Illumina WGS, jumping library WGS, 10X Genomics synthetic long reads and haplotype phasing, BioNano genomics optical mapping, and Hi-C.
In Aim 2, I will apply these methods to evaluate the contribution of all classes of SV to ASD from analysis of 4,000 genomes from ASD families. I will particularly focus on balanced, complex SVs multi-allelic CNVs, a natural extension from my PhD work creating the PennCNV and ParseCNV algorithms for CNV detection from microarray data. Finally, in Aim 3 I will learn a new area of expertise in functional genomics to investigate the local and global transcriptional impact of complex SVs and chromothripsis in patient-specific iPS-derived neural precursors and neurons. At its conclusion, these studies will develop a sensitive and specific SV detection tool from WGS data, having broad utility in gene discovery efforts in ASD, NPDs, and human disease, while providing targeted career development.
About 1.5% of children in the United States (1/68) are diagnosed with autism spectrum disorder (ASD), and genome structural variation (SV) is among the most common known causes of the disorder. Emerging technologies offer unprecedented opportunities to view the true complexity of the genome at reasonable cost. This project will develop algorithms to reproducibly capture the spectrum of SV in a human genome and assay transcripts in neural cells, which will characterize the largest component of the genetic risk for ASD defined to date for the first time at sequence resolution, with implications for many other human diseases.
Collins, Ryan L; Brand, Harrison; Redin, Claire E et al. (2017) Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol 18:36 |
Redin, Claire; Brand, Harrison; Collins, Ryan L et al. (2017) The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies. Nat Genet 49:36-45 |