Rare and highly disease-penetrant human genetic variation offers potential to accelerate our understanding of mechanisms and pathways that contribute to type-2 diabetes (T2D), opening opportunities to translate findings into therapeutic targets and improved individual risk stratification. Motivated by this potential, large-scale sequencing studies have been undertaken to systematically catalog rare variation (<<1%) across the entire genome. These efforts have revealed thousands of rare variants, with unclear functional significance. The identification of causal rare variants has been hindered in three ways: First, current analytical practices focus on the coding genome for ease of interpretability, leaving unevaluated the role of noncoding variation, despite its clear importance for disease risk. Second, ignoring the polygenic nature of T2D, rare variant burden is calculated at the level of indiviual genes rather than across biological networks of genes or potentially functional noncoding regions, due to lack of computational methods for credible groupings and systematic collective evaluation. Finally, existing algorithms to identify pathogenic candidates are underpowered, a problem that is antagonized by prohibitive replication costs and impedes efforts to demonstrate compelling statistical association between rare variants and disease. Overcoming these challenges will allow us to evaluate the hypothesis that rare, particularl noncoding variation contributes risk to T2D, the aim of this proposal. We will: (1) develop algorithms to model expected levels of coding and noncoding polymorphism across human population using features empirically learned from publicly available data sets (1000 Genomes, NHLBI Exomes), implemented in a new rare variant burden test for association, (2) develop computational informatics and systems-based approaches to uncover pathogenic T2D gene networks based on genetic data from hundreds of loci implicated in T2D risk and related traits, (3) apply our new algorithms and identified gene networks to evaluate rare variant burden for T2D in ~2850 individuals sequences across the genome, and (4) demonstrate T2D relevance via replication using cost-effective multiplex targeted re-sequencing in >33,000 individuals. Completion of these aims will result in development and public release of software for the analysis of non-coding variation, and the identification of networks and rare variants contributing susceptibility to T2D.

Public Health Relevance

Worldwide, the increasing incidence of type-2 diabetes is placing a critical burden on health care, demanding new approaches to treatment and intervention. Rare non-coding mutations with large effects on T2D predisposition offer the promise for novel innovations in clinical practice, though identifying these mutations remains challenging. To overcome this challenge, our proposal develops new computational methodology to pinpoint relevant variation, the pathways in which they fall, and replication efforts to demonstrate conclusive association.

National Institute of Health (NIH)
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Blondel, Olivier
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Medicine
United States
Zip Code