The NHLBI TOPMed whole genome sequencing (WGS) studies are generating unprecedented scale of sequence reads, totaling >2 quadrillion bases and >300 million variants across >20,000 individuals. While >97% of accessible genomic regions are be exhaustively interrogated through existing variant calling methods, ~3% repeat-rich genomic regions are insufficiently interrogated due to limited ability to call short tandem repeats (STRs). Because ~50% short insertions and deletions (indels) are found in repeat-rich regions of genome, it is important to comprehensively call STRs to reach near-complete sensitivity to identify disease-causing variants from TOPMed WGS studies. In this application, we build on our record of developing innovative methods and analyzing petabytes of TOPMed WGS reads to generate comprehensive and accurate short variant calls, capitalizing on STRs, from TOPMed WGS studies. We leverage related and duplicated samples to improve the quality of STRs. We also propose to estimate mitochondrial DNA copy numbers and telomere lengths from the sequence data, and perform genome-wide association studies to demonstrate the power of the new STR-augmented callset.

Public Health Relevance

Short tandem repeats (STRs) consists of a large fraction (>50%) of short insertions and deletions (indels) but currently undercalled by most existing variant calling methods. Because STRs have different mutational mechanisms, recurrence rate, allele frequency spectrum, and error rates in sequencing and alignment compared to simple biallelic SNPs and indels, STRs are often poorly tagged by existing array-based SNPs, and potentially explain a large fraction of missing heritability of complex traits. By comprehensively calling STRs and performing genome-wide association analysis of two traits - mitochondrial DNA copy numbers and telomere lengths, which are reported to be associated with many cardiovascular and hematologic traits ? we expect that our analysis will unravel novel biological insight on the genetic architecture of these traits. In addition, the variant callset that will be generated and deposited from our proposed study will motivate other investigators of TOPMed WGS studies to extend the horizon of their analysis to encompass STRs and other complex variants exclusively identified from our callset.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HL133758-02
Application #
9320985
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Luo, James
Project Start
2016-07-22
Project End
2018-10-31
Budget Start
2017-05-01
Budget End
2018-10-31
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
Schools of Public Health
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109