The overarching goal of this project is to develop a suite of computational tools to detect structural variants (SVs) by long-read sequencing, and to facilitate their annotation and clinical interpretation. Although short-read sequencing has been widely used in research and clinical settings, it has limited ability to identify SVs due to the presence of repeat elements. It is known that pathogenic SVs might be missed by short-read sequencing, potentially contributing to the low diagnostic rates (~30-40%) in clinical genome/exome sequencing. The lack of reliable tools for clinical interpretation of SVs further limits our ability to identify mutations that contribute to human diseases. To address these challenges, we will develop LinkedSV to detect SVs from linked-read genome and exome sequencing data generated by the 10X Genomics platform, and develop LongSV to detect SVs from PacBio and Nonopore long-read sequencing data. We will also develop LabelSV to analyze optical mapping data from Bionano Genomics, and to characterize complex SVs by integrating kilobase-resolution SV calls from optical mapping and base-resolution SV calls from sequencing platforms. Finally, based on our prior development of ANNOVAR and InterVar tools, we will develop a computational method to facilitate clinical interpretation of SVs. By integrating gene dosage sensitivity, mutation intolerance, and phenotype information, this method helps clinical interpretation of candidate SVs on disease phenotypes. Taken together, our methods will streamline the workflow for SV detection and variant interpretation. We will distribute and maintain user-friendly software tools to implement the proposed SV detection methods, and to generate reproducible and traceable results that conform to the current and future versions of ACMG (American College of Medical Genetics and Genomics) / AMP (Association for Molecular Pathology) guidelines. We believe that our methods will substantially improve SV detection, enable consistent interpretation of SVs, and facilitate the implementation of genome-guided precision medicine.
We will develop novel bioinformatics approaches for the detection and clinical interpretation of structural variants using data generated from long-read sequencing technologies. These tools will help identify causal genetic mutations for a fraction of patients who were negative from exome sequencing, and facilitate the implementation of personalized genomic medicine.