Rare and highly disease-penetrant human genetic variation offers potential to accelerate our understanding of mechanisms and pathways that contribute to type-2 diabetes (T2D), opening opportunities to translate findings into therapeutic targets and improved individual risk stratification. Motivated by this potential, large-scale sequencing studies have been undertaken to systematically catalog rare variation (<<1%) across the entire genome. These efforts have revealed thousands of rare variants, with unclear functional significance. The identification of causal rare variants has been hindered in three ways: First, current analytical practices focus on the coding genome for ease of interpretability, leaving unevaluated the role of noncoding variation, despite its clear importance for disease risk. Second, ignoring the polygenic nature of T2D, rare variant burden is calculated at the level of indiviual genes rather than across biological networks of genes or potentially functional noncoding regions, due to lack of computational methods for credible groupings and systematic collective evaluation. Finally, existing algorithms to identify pathogenic candidates are underpowered, a problem that is antagonized by prohibitive replication costs and impedes efforts to demonstrate compelling statistical association between rare variants and disease. Overcoming these challenges will allow us to evaluate the hypothesis that rare, particularl noncoding variation contributes risk to T2D, the aim of this proposal. We will: (1) develop algorithms to model expected levels of coding and noncoding polymorphism across human population using features empirically learned from publicly available data sets (1000 Genomes, NHLBI Exomes), implemented in a new rare variant burden test for association, (2) develop computational informatics and systems-based approaches to uncover pathogenic T2D gene networks based on genetic data from hundreds of loci implicated in T2D risk and related traits, (3) apply our new algorithms and identified gene networks to evaluate rare variant burden for T2D in ~2850 individuals sequences across the genome, and (4) demonstrate T2D relevance via replication using cost-effective multiplex targeted re-sequencing in >33,000 individuals. Completion of these aims will result in development and public release of software for the analysis of non-coding variation, and the identification of networks and rare variants contributing susceptibility to T2D.

Public Health Relevance

Worldwide, the increasing incidence of type-2 diabetes is placing a critical burden on health care, demanding new approaches to treatment and intervention. Rare non-coding mutations with large effects on T2D predisposition offer the promise for novel innovations in clinical practice, though identifying these mutations remains challenging. To overcome this challenge, our proposal develops new computational methodology to pinpoint relevant variation, the pathways in which they fall, and replication efforts to demonstrate conclusive association.

National Institute of Health (NIH)
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Blondel, Olivier
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Medicine
United States
Zip Code
Yin, Peter; Anttila, Verneri; Siewert, Katherine M et al. (2017) Serum calcium and risk of migraine: a Mendelian randomization study. Hum Mol Genet 26:820-828
Scott, Robert A; Scott, Laura J; M├Ągi, Reedik et al. (2017) An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans. Diabetes 66:2888-2902
Aggarwala, Varun; Ganguly, Arupa; Voight, Benjamin F (2017) De novo mutational profile in RB1 clarified using a mutation rate modeling algorithm. BMC Genomics 18:155
Brynedal, Boel; Choi, JinMyung; Raj, Towfique et al. (2017) Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation. Am J Hum Genet 100:581-591
Siewert, Katherine M; Voight, Benjamin F (2017) Detecting Long-Term Balancing Selection Using Allele Frequency Correlation. Mol Biol Evol 34:2996-3005
Jason, Flannick (see original citation for additional authors) (2017) Sequence data and association statistics from 12,940 type 2 diabetes cases and controls. Sci Data 4:170179
Aikens, Rachael C; Zhao, Wei; Saleheen, Danish et al. (2017) Systolic Blood Pressure and Risk of Type 2 Diabetes: A Mendelian Randomization Study. Diabetes 66:543-550
Mishra, Rajashree; Chesi, Alessandra; Cousminer, Diana L et al. (2017) Relative contribution of type 1 and type 2 diabetes loci to the genetic etiology of adult-onset, non-insulin-requiring autoimmune diabetes. BMC Med 15:88
Aggarwala, Varun; Voight, Benjamin F (2016) An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat Genet 48:349-55
Ehret, Georg B (see original citation for additional authors) (2016) The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nat Genet 48:1171-1184

Showing the most recent 10 out of 15 publications