Identification of genetic rare variants that predispose individuals to complex diseases -- such as obesity, heart disease, and type 2 diabetes (T2D) -- is an important step toward understanding disease etiology, which in turn has the potential to lead to breakthroughs in diagnosis, prevention, and treatment. Recent large-scale sequencing studies have started to identify rare variants of disease susceptibility, and further discoveries will be facilitated with more efficient designs and powerful statistical methods to integrate all available data. When multiple studies investigate the same disease or trait, the power to identify rare disease-susceptibility variants i greatly improved by integrating them via meta-analysis. Additionally, we can increase sample size and hence power by using sequenced samples from studies of other diseases as controls. Finally, by incorporating functional information of rare variants collected from various experiments into our association tests, analysis power can be improved. Our proposal represents several critical methodological improvements for all three strategies, which will increase power significantly. Specifically, we will develop 1) robust meta-analysis methods for rare-variant association tests for binary traits; 2) methods to use external samples as control samples to increase power while controlling for a possible batch effect; 3) an integrative analysis approach for testing non-coding regions by incorporating functional annotations. The proposed methods will be evaluated through extensive simulation studies and applications to multiple real datasets. In addition we will continue to develop, distribute, and support open-source software packages for the proposed methods and update and support our current software.

Public Health Relevance

Complex diseases such as obesity, heart disease, and type 2 diabetes (T2D) are major public health concerns. The proposed research will develop advanced computational and statistical methods to improve power to identify rare variants of disease susceptibility. The power gain from these methods will be translated into gains in our understanding of human disease etiology and eventual improvements in human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Biostatistics & Other Math Sci
Schools of Public Health
Ann Arbor
United States
Zip Code
Dutta, Diptavo; Scott, Laura; Boehnke, Michael et al. (2018) Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes. Genet Epidemiol :
Zhou, Wei; Nielsen, Jonas B; Fritsche, Lars G et al. (2018) Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet 50:1335-1341
Dey, Rounak; Schmidt, Ellen M; Abecasis, Goncalo R et al. (2017) A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS. Am J Hum Genet 101:37-49
Lee, Seunggeun; Kim, Sehee; Fuchsberger, Christian (2017) Improving power for rare-variant tests by integrating external controls. Genet Epidemiol 41:610-619
He, Zihuai; Lee, Seunggeun; Zhang, Min et al. (2017) Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA). Genet Epidemiol 41:801-810
Lee, Seunggeun; Sun, Wei; Wright, Fred A et al. (2017) An improved and explicit surrogate variable analysis procedure by coefficient adjustment. Biometrika 2:303-316