Short tandem repeats, also known as microsatellites, are relatively unstable DNA elements comprised of repeating subunits typically ranging from 2-6 bp in length. Large pathogenic expansions in short tandem repeats (hundreds to thousands of nucleotides) have been implicated in over 40 genetic disorders to date, which are collectively known as repeat expansion diseases. Although repeat expansion diseases represent a diverse group of genetic disorders, they typically involve the nervous system and are marked by neurological, neurodegenerative, neuromuscular, and/or developmental abnormalities. Current approaches for discovering the genetic basis of repeat expansion disorders are challenging in multiple respects and are often limiting to their study: they rely on the availability of large families with multiple affected individuals, and employ low-throughput and laborious molecular biology methods. Genetic investigations for many diseases have now shifted over to using ?next-generation? sequencing methods which are capable of examining variants on a genome-wide scale. However, because next-generation sequencing reads are typically quite short (100-250 nucleotides) compared to the length of expanded repeat tracts (several hundred to several thousand nucleotides), there are technical barriers to identifying and accurately typing expanded tandem repeats using this approach. Here we propose a novel, next-generation sequencing approach which will enable expanded short tandem repeats to be effectively enriched and typed using short next-generation sequencing reads. In our approach, artificial genetic diversity is introduced into monotonous repeat sequences through the incorporation of synthetic hybridization probes containing unique, random DNA tags (unique molecular identifiers). After hybridization of these probes to target DNA, covalent linkage, and PCR amplification of converted repeat tract DNA, standard protocols for DNA fragmentation and next-generation sequencing library preparation are applied. Repeat tracts, now containing integrated molecular tags, are then subjected to massively parallel sequencing. During analysis, reads spanning across adjacent molecular tags are computationally assembled together into a larger contiguous sequence, enabling reconstruction of the converted tract on the basis of these synthetic, high diversity regions. After computational removal of the molecular tags, the composition of the original tract can be accurately resolved. In our first Aim, we will develop and optimize this technology to enable accurate sizing of expanded repeats across several representative repeat expansion diseases involving tracts of different lengths and composition. In our second Aim, we will adapt our methods in order to develop a multiplexed panel of probes targeting all short tandem repeats in the human genome, enabling inexpensive, high-throughput, whole genome screening for both known and previously unknown expanded repeat diseases. The availability of robust, cost-effective, quantitative, and generally applicable tools for the detection and characterization of expanded repeat disorders will provide enhanced, transformative capabilities in the diagnosis and genetic investigation of these disorders. Consequently, these methods will greatly facilitate genetic discovery and study of repeat expansion diseases, and will have application to typing other repetitive elements in the genome, including centromeres and telomeres.
Advanced DNA sequencing technologies are becoming increasingly integral to the study of genetic disorders but are difficult to apply against diseases caused by large pathogenic expansions of repetitive DNA elements, the so-called ?repeat expansion diseases?. This proposal outlines the development and validation of methods for rapidly, inexpensively, and unbiasedly detecting pathogenic repeat expansions on a genome-wide scale. These technologies will be suitable for use as basic science tools and clinical diagnostics, will greatly facilitate the study and identification of repeat expansion diseases both in individual patients and in affected families, and will have the power to uncover previously unknown repeat expansion disorders.