In recent years, high-throughput sequencing technologies have transformed our understanding of human genetic variation by enabling the sequencing of individual human genomes as well as sequencing on a population-scale. Short insertions/deletions (indels) represent the second most frequent form of variation in the human genome, which is also functionally important. Indels have not received as much attention as single nucleotide variants (SNVs) and structural variants in part because the detection of indels from high-throughput sequence datasets is challenging and available computational methods exhibit significantly lower sensitivity and specificity compared to methods that are designed to identify single nucleotide variants. Novel computational methods that address the challenge presented by the detection and genotyping of indels are thus urgently needed. We propose to develop novel methods for the detection of short indels from both individual and population-scale sequence datasets that will utilize information about indel error rates that are specific to sequence context as well as sequencing platform from large-scale sequence datasets in order to generate accurate indel calls and genotypes. The development of these methods will significantly enhance the ability of researchers to extract accurate information about genetic variation from sequencing datasets, improve their ability to identify variants that are associated with disease susceptibility and improve our understanding of the extent and distribution of short indels in the human genome.
We are developing methods for the discovery of short insertion/deletion variants, an abundant form of genetic variation in the human genome. These variants are known to affect risk for rare and complex diseases including cancers. This project will advance our ability to understand how these variants affect human health.