Many health conditions, including substance use and mental illnesses, are complex and depend on both genetic and environmental factors. In the past several years genome wide association studies (GWA) have identified single-nucleotide polymorphisms implicating hundreds of robustly replicated loci for common traits. Despite numerous successes, it remains persistently difficult to identify genes, environmental factors, and interactions among them for complex diseases. This has been referred to as the geneticist's nightmare. Most of the identified variants have low associated risks and account for little heritability, and there is an increasing attention to find the """"""""missing heritability"""""""" of complex diseases. To this end, it is important to develop novel statistical methods. Our Preliminary Progress demonstrates that our proposed methods have already produced significant findings on the association between genes, environments, and complex traits. Several genetic variants that we identified by our novel methods will be cataloged by National Human Genome Research Institute. This project will take advantage of the PI's many years of experience in the data collection and analysis of GWA studies and build on his success in the development of statistical methods and software for genetic studies. The primary aim of this application is to continue our effort and success in developing, evaluating, and applying new statistical models, methods, and software to conduct GWA analyses of complex diseases.
Our specific aims are as follows: (A1) to develop statistical methods to perform inference for multidimensional and multi-modal traits. New methods will be developed to find the hidden heritability by incorporating multiple variants;simultaneously considering genetics and environment, and modeling multiple and heterogeneous traits;(A2) to develop tree- and forest-based methods for association analyses by incorporating multiple genetic variants, covariates, and gene-covariate interactions and incorporating existing biological information;(A.3) to develop and release software for public use through the PI's website. While the methods and software are developed, they will be applied to a variety of real studies that will serve as motivation and validation of our methods and software. In this regard, our secondary aims are to (B1) identify genes and environmental factors for addiction, mental illnesses, and the co-morbidity of psychiatric disorders;and (B2) identify genetic variants and environmental factors for preterm deliveries. In short, the objective of this project is significant, the foundation of our approach has been tested, and the new development will be novel and useful. The PI has decades of experience related to this project and leads a research center with well-established infrastructure and supporting personnel and students.

Public Health Relevance

Despite great advances in technology and methodology that have led to recent successes in identifying genetic variants for complex diseases, developments of novel statistical methods are critically important in dealing with difficulties inherent in geneic studies of complex phenotypes. This project will have a significant impact on analysis of genetic data and hence on public health, because our methods and software can help investigators understand genetic and environmental factors of common and complex diseases including substance use, cancer, and preterm birth.

Agency
National Institute of Health (NIH)
Institute
National Institute on Drug Abuse (NIDA)
Type
Research Project (R01)
Project #
2R01DA016750-09
Application #
8324400
Study Section
Cardiovascular and Sleep Epidemiology (CASE)
Program Officer
Wideroff, Louise
Project Start
2003-07-01
Project End
2017-03-31
Budget Start
2012-04-01
Budget End
2013-03-31
Support Year
9
Fiscal Year
2012
Total Cost
$290,143
Indirect Cost
$110,143
Name
Yale University
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520
Pan, Wenliang; Tian, Yuan; Wang, Xueqin et al. (2018) BALL DIVERGENCE: NONPARAMETRIC TWO SAMPLE TEST. Ann Stat 46:1109-1137
You, Na; He, Shun; Wang, Xueqin et al. (2018) Subtype classification and heterogeneous prognosis model construction in precision medicine. Biometrics 74:814-822
Liu, Dungang; Zhang, Heping (2018) Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. J Am Stat Assoc 113:845-854
Guo, Xiaobo; Zhu, Junxian; Fan, Qiao et al. (2018) A univariate perspective of multivariate genome-wide association analysis. Genet Epidemiol 42:470-479
Wen, Canhong; Mehta, Chintan M; Tan, Haizhu et al. (2018) Whole genome association study of brain-wide imaging phenotypes: A study of the ping cohort. Genet Epidemiol 42:265-275
Mehta, Chintan M; Gruen, Jeffrey R; Zhang, Heping (2017) A method for integrating neuroimaging into genetic models of learning performance. Genet Epidemiol 41:4-17
Xiao, Feifei; Niu, Yue; Hao, Ning et al. (2017) modSaRa: a computationally efficient R package for CNV identification. Bioinformatics 33:2384-2385
Bi, Xuan; Yang, Liuqing; Li, Tengfei et al. (2017) Genome-wide mediation analysis of psychiatric and cognitive traits through imaging phenotypes. Hum Brain Mapp 38:4088-4097
Song, Chi; Min, Xiaoyi; Zhang, Heping (2016) THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES. Ann Appl Stat 10:2102-2129
Cao, Taoyun; Wang, Xueqin; Zhang, Heping (2016) Energy bagging tree. Stat Interface 9:171-181

Showing the most recent 10 out of 94 publications