Large expansions of tandemly repeated (TR) DNA sequences (eg. polyCAG) are known to underlie >30 different human neurological diseases, including Huntington?s disease, Fragile X, and Myotonic dystrophy. The vast majority of known TR expansions are observed in adult onset degenerative neuromuscular disorders and ataxia syndromes. Although significant recent advances have been made that enable TRs to be genotyped from high-throughput sequencing, the methods currently used to sequence human genomes are unable to identify TR expansions, as they only look at very short fragments of DNA. However, recent advances and falling costs of sequencing technologies like Pacific Biosciences SMRT sequencing that generates much longer reads hold the promise to detect previously undetected repeat expansions. Here we will perform whole genome sequencing using combined Pacific Biosciences (PacBio) on a selected cohort of patients with unsolved ataxias and Huntington?s-like disease, in which all known TR expansions and other mutations have been excluded. Many of these samples come from multi-generation pedigrees with dominant inheritance that show genetic anticipation and linkage information that localizes the pathogenic mutation to a subset of the genome, thus representing an optimized cohort in which to search for unknown pathogenic TR expansions. In order to be able to identify TR expansions underlying human disease, it is first necessary to characterize the spectrum of tandem repeat variation within the normal population. Using genomes of 26 individuals sequenced with PacBio, we will use a novel algorithms we have developed called MsPac and PacMONSTR, to generate a survey of the size distribution of all TRs in the normal human genome. This will be supplemented by TR genotypes generated by HipSTR from 1,500 Illumina genomes. This information will provide a baseline survey of TR variation that will allow us identify pathogenic TR expansions in samples with ataxia and neurodegenerative disease, and as we show, also enables us to identify candidate TRs that are likely to expand in human disease. Using this approach, we will first perform targeted genotyping of four polyglutamine TRs that show strong signatures of instability in 250 samples with SCA/HD-phenocopies. We will next perform PacBio genome sequencing of 100 individuals from 40 pedigrees with unsolved ataxia/HD-like disease, using a selected cohort of samples in which all known genetic and environmental causes have already been excluded. We hypothesize that the mutation in some of these pedigrees will be novel expanded TRs that have remained invisible to previous short-read approaches. We will search for novel TR expansions not observed in our control population. Using this optimized cohort and novel hybrid long-read sequencing approach, this proposal will lead to the identification of novel pathogenic TR expansions that underlie human neurological diseases, yielding significant advances in our understanding of the etiology of ataxia and neurodegenerative disease.

Public Health Relevance

Expansions of tandem repeats (TRs) are known to underlie >20 different human neurological diseases, including Huntington?s disease, Fragile X, and several ataxia syndromes. Despite undergoing comprehensive genome sequencing, genetic mutations are not found in many patients with inherited neurological disease, but due to problems in mapping short reads to repetitive regions, pathogenic TR expansions remain essentially undetectable with standard Illumina genome sequencing. Here we propose to identify novel TR expansions underlying human disease by performing whole genome sequencing using Pacific Biosciences (PacBio) long read technology to an optimized cohort of patients with unsolved ataxias and Huntington?s like disease, in which all known TR expansions and pathogenic coding mutations by exome and/or genome sequencing have already been excluded.

National Institute of Health (NIH)
National Institute of Neurological Disorders and Stroke (NINDS)
Research Project (R01)
Project #
Application #
Study Section
Genetics of Health and Disease Study Section (GHD)
Program Officer
Miller, Daniel L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Icahn School of Medicine at Mount Sinai
Schools of Medicine
New York
United States
Zip Code