The advent of genomic and imaging technologies provides us with a great opportunity to study and understand health conditions, including substance use and mental illnesses, which are complex and depend on both genetic and environmental factors. In the past decades genomewide association studies (GWA) have identified and robustly replicated numerous genetic variants that are associated with complex diseases. Despite those successes, it remains persistently difficult to identify genes and environmental factors--the so called geneticist's nightmare. Most of the identified variants have low associated risks and account for little heritability, and there is increasing attention focused on finding the ?missing heritability of complex diseases. Furthermore, it is documented that clinical contributions from neuropsychiatric research have been minimal due to traditionally small sample sizes of studies, biologically incorrect diagnostic labels, comorbidity and heterogeneity of the diseases. To address these problems and advance clinical science, we need to develop novel models and methods to efficiently use and understand the available data. This is the primary motivation for our project. We will develop more efficient approaches that utilize biological information (genetic and/or phenotypic data) and directly address the comorbidity issue. In addition, we will analyze large datasets such as UK BioBank with demographic, clinical, and genetic data. We will further take advantage of the investigators' many years of experience in the data collection and analysis of GWA studies and build on our successes in the development and applications of statistical methods and software for complex studies. The primary aim of this application is to develop, evaluate, and apply new statistical (both parametric and nonparametric) models, methods, and software to conduct genetic analyses of complex diseases. To deal with the challenges stated above, our proposed methods will address one or more of the following topics: (a) analysis of genetic, phenotypic, and environmental data; (b) modeling comorbidity through multivariate traits; and (c) identification and incorporation of novel genetic variants including their interactions with environmental factors by using and developing state-of-the-art statistical methodology and software, such as trees and forests. The success of our project will have a direct impact on our understanding, and ultimately, the treatment and prevention of diseases which are of significant public health concern.

Public Health Relevance

Identifying genetic variants for complex diseases remains important for public health and yet challenging. To this end, this project is to develop novel statistical methods to meet such challenges. Our methods and software can help investigators better understand genetic and environmental factors for common and complex diseases including learning disability and mental disorders, and ultimately develop treatment and prevention strategies for complex diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Li, Rongling
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code