In the past fifteen years, great efforts have been made to understand the genetic architecture of complex human diseases through genome-wide association studies. Although many genome-wide significant variants have been identified, the heritability or variance explained by these variants remains very small, suggesting substantial missing heritability that may yet be explained by common genetic variants with smaller effect sizes and/or rare and low frequency variants, which calls for the development and application of novel statistical methods to whole genome/exome sequencing data collected from deeply phenotyped cohorts. In this project, we will develop methods that leverage multiple correlated endophenotypes and further integrate functional annotation data to identify novel rare variants for complex traits. We will develop a set of new computational and analytical tools that are practically useful and broadly applicable to general sequencing studies, and the applications of our methods will likely identity novel rare variant associations and shed new lights on the genetics of cardiometabolic diseases.
In Aim 1, we propose to develop novel statistical methods to integrate multiple endophenotypes to study the impact of rare variants on complex human diseases. Our methods will fill in the gap between the current practice of association studies and the practical needs of integrating endophenotypes for improved understanding and diagnosis of clinical outcomes.
In Aim 2, we will extend the methods to meta-analyses across studies.
In Aim 3, we will develop a novel kernel machine learning approach to integrating various functional information to annotate the whole genome region, and further integrate them to develop a dynamic whole-genome scan test to detect rare variant associations with multiple endophenotypes. We will leverage the NHLBI TOPMed whole genome sequencing (WGS) data and the UK Biobank whole exome sequencing (WES) data, and integrate the functional annotation data to identify and dissect the role of rare variants on the cardiometabolic traits (Aim 4). Our proposed work is cost-effective as it leverages the existing WGS/WES samples and functional annotation data while providing methods and tools that are broadly applicable to other studies, and builds on a strong team of scientists with proven track record in statistical genetics, large-scale genetic studies, and cardiometabolic traits. We expect our methods will lead to the discoveries of many more rare and low frequency variants for these traits. These results will offer new insights to help design more effective treatment and prevention strategies. All our proposed methods will be disseminated to the public through well-tested and publicly available software (Aim 5).

Public Health Relevance

We will develop statistical methods and computational software to analyze whole genome/exome sequencing data by integrating multiple phenotypes and functional annotations to identify novel genes associated with cardiometabolic diseases and allow researchers to better disentangle the impacts of rare variants on these complex human diseases. It will also facilitate the translation of basic research findings into clinical studies.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Biostatistics & Other Math Sci
Schools of Public Health
New Haven
United States
Zip Code