Expansions of tandemly repeated (TR) sequences are known to cause many genetic disorders, including Huntington's disease (HD), Fragile X and multiple forms of Spinocerebellar Ataxias (SCA). However, testing for known TR expansions in patients with clinical symptoms of SCAs does not usually identify the underlying mutation, even in cases with a strong family history. In part, this is driven by the inability of short-read sequencing technologies to resolve repetitive genomic regions larger than the sequencing reads. We hypothesize that we can identify novel large expansions of pathogenic tandem repeats using third-generation, long-read sequencing technologies. Here, we intend to develop background TR models using a control cohort and search for potential disease TR loci in a cohort of ataxia patients. To test this, we will use algorithms that we have developed (MsPac and PacMonSTR), that assemble long reads into distinct haplotypes and accurately genotypes tandem repeats on both the maternal and paternal haplotype.
In Aim 1, we will genotype tandem repeats in a cohort of 600 healthy individuals sequenced with Illumina using HipSTR, and further genotype 26 healthy individuals sequenced with PacBio using MsPac and PacMonSTR. Though our preliminary data shows that short reads are insufficient to detect large TRs,the majority of TRs in a normal genome are short enough to be detected with short reads - making short read data sufficient to develop portion of our control cohort.
In Aim 2, we introduce our cohort diagnosed with ataxia. The pedigree of these individuals shows anticipation, and autosomal dominant inheritance, however these individuals have been screened for known ataxia mutations, and many have been whole genome or whole exome sequenced with Illumina short-reads without identification of causal mutations. From our preliminary analysis, we have identified four highly polymorphic loci that might underlie a repeat expansion disease. The first step will be to screen an ataxia cohort of 96 selected samples for these loci using targeted approaches. For 25 individuals in which no expansion is detected, we intend to perform whole genome sequencing with PacBio and detect expanded TRs using our algorithms from Aim 1. The results of this proposal will lead to the identification of novel mutations in a set of ataxia patients, and also a general experimental and computational framework for the identification of such mutations in any patient.

Public Health Relevance

While tandem repeat (TR) expansions have been implicated in a variety of genetic disorders, limitations in standard sequencing approaches have made it challenging to detect these events and to discover new classes of TR expansions. This study will employ third generation sequencing technologies with custom algorithms to identify novel tandem repeat (TR) expansions responsible for causing genetic disorders. If successful, this could lead to a more comprehensive panel and methodology for screening individuals with potential repeat expansion diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of Neurological Disorders and Stroke (NINDS)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
5F31NS108797-02
Application #
9764142
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Miller, Daniel L
Project Start
2018-07-11
Project End
2020-07-10
Budget Start
2019-07-11
Budget End
2020-07-10
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Icahn School of Medicine at Mount Sinai
Department
Genetics
Type
Schools of Medicine
DUNS #
078861598
City
New York
State
NY
Country
United States
Zip Code
10029