The research tackles several challenges in large-scale biomedical studies through the development and implementation of powerful and theoretically sound statistical methods to detect robust biomarker associations. The developed methods are motivated by and will have significant impact on large-scale genetic association studies, and are also broadly applicable to other disciplines (e.g. survey sampling and mental health imaging studies). The developed methods will be applied to detect novel genetic biomarkers that are associated with multiple cardiometabolic traits, and generate novel hypotheses for biological and clinical investigation. The project will also solve some long-standing problems in statistics, and will provide theoretically sound and much more powerful methods than the commonly used ones. The project team will integrate the research results into training the next generation undergraduate and graduate students in the fast growing field of biomedical data science. This project will also promote teaching, training and learning, and broaden the participation of students from under-represented groups.

Recent methodological and computational advances have facilitated the applications of statistical methods to analyze simple (primarily single) disease outcomes in large-scale genome-wide association studies in the field. However, these studies have identified only a small proportion of the risk variants and there likely remain many more common variants with modest effect sizes and/or rare variants yet to be discovered. Existing methods and statistical theories are not adequate for analyzing high-dimensional association with clustered outcomes (e.g. longitudinal outcomes), and multiple correlated and/or secondary outcomes. This project aims to address this urgent need by developing new statistical methods with solid theoretical foundation to integrate multiple correlated phenotypes to identify novel genetic variants for complex traits. In particular, the project will (1) develop powerful statistical methods for testing high-dimensional association with clustered outcomes; (2) develop theoretically sound and powerful statistical methods for testing association with multiple secondary traits; and (3) develop and apply a unified modeling framework that applies our developed statistical methods, leverages the whole genome sequencing data, and integrates functional annotation data to help identify and dissect the role of rare variants on the cardiometabolic traits. The objective of the education plan is to integrate the latest research development in statistical genetics into existing/new courses to prepare students for their future professions in biomedical/health informatics. The research will also include software development to implement the methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Application #
Program Officer
Junping Wang
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
United States
Zip Code