We propose to create new infrastructure and methods for genomic analysis and apply these to large, complex datasets for type 2 diabetes (T2D), a leading cause of morbidity and mortality that is driven by diverse genetic and environmental factors. This proposal has three primary scientific goals. (1) We will develop infrastructure and analytical tools to harmonize heterogeneous genomic datasets ascertained for the study of complex disease, as demonstrated on DNA sequencing data from over 50,000 individuals;(2) we will design statistical frameworks to identify functional mutations in T2D and analyze their biological consequences, taking advantage of existing data and resources on genetic variation, transcription, and epigenetics;and finally (3) we will democratize access to genomic data by creating user-friendly portals with automated analytical pipelines and intuitive features for data exploration. The software, methods, and web portals we build will help overcome the barriers that currently inhibit the translation of genomic data into biological knowledge and therapeutic insights for T2D.

Public Health Relevance

A major goal of biomedical research is to identify biological processes that underlie human diseases so that safe, effective therapies can be developed more rapidly and cost-effectively. New technology has made it possible to collect large-scale genomic data sets relevant to the genetic and molecular basis of disease, but interpreting this information is difficult due to challenges in accessing the underlying data, assessing its biological implications, and disseminating results so that biomedical researchers and ultimately patients can benefit. We will assemble a multi-disciplinary team to overcome these challenges and create the methods and tools necessary to translate genomic big data into biological understanding, applying these methods and tools specifically to improving our understanding of type 2 diabetes.

National Institute of Health (NIH)
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Blondel, Olivier
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Broad Institute, Inc.
United States
Zip Code
Walsh, Roddy; Thomson, Kate L; Ware, James S et al. (2016) Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples. Genet Med :
Zou, James; Valiant, Gregory; Valiant, Paul et al. (2016) Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nat Commun 7:13293
Minikel, Eric Vallabh; Vallabh, Sonia M; Lek, Monkol et al. (2016) Quantifying prion disease penetrance using large population control cohorts. Sci Transl Med 8:322ra9
Narasimhan, Vagheesh M; Hunt, Karen A; Mason, Dan et al. (2016) Health and population effects of rare gene knockouts in adult humans with related parents. Science 352:474-7
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V et al. (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285-91
Minikel, Eric Vallabh; MacArthur, Daniel G (2016) Publicly Available Data Provide Evidence against NR1H3 R415Q Causing Multiple Sclerosis. Neuron 92:336-338