Many health conditions, including substance use and mental illnesses, are complex and depend on both genetic and environmental factors. In the past several years genome wide association studies (GWA) have identified single-nucleotide polymorphisms implicating hundreds of robustly replicated loci for common traits. Despite numerous successes, it remains persistently difficult to identify genes, environmental factors, and interactions among them for complex diseases. This has been referred to as the geneticist's nightmare. Most of the identified variants have low associated risks and account for little heritability, and there is an increasing attention to find the """"""""missing heritability"""""""" of complex diseases. To this end, it is important to develop novel statistical methods. Our Preliminary Progress demonstrates that our proposed methods have already produced significant findings on the association between genes, environments, and complex traits. Several genetic variants that we identified by our novel methods will be cataloged by National Human Genome Research Institute. This project will take advantage of the PI's many years of experience in the data collection and analysis of GWA studies and build on his success in the development of statistical methods and software for genetic studies. The primary aim of this application is to continue our effort and success in developing, evaluating, and applying new statistical models, methods, and software to conduct GWA analyses of complex diseases.
Our specific aims are as follows: (A1) to develop statistical methods to perform inference for multidimensional and multi-modal traits. New methods will be developed to find the hidden heritability by incorporating multiple variants;simultaneously considering genetics and environment, and modeling multiple and heterogeneous traits;(A2) to develop tree- and forest-based methods for association analyses by incorporating multiple genetic variants, covariates, and gene-covariate interactions and incorporating existing biological information;(A.3) to develop and release software for public use through the PI's website. While the methods and software are developed, they will be applied to a variety of real studies that will serve as motivation and validation of our methods and software. In this regard, our secondary aims are to (B1) identify genes and environmental factors for addiction, mental illnesses, and the co-morbidity of psychiatric disorders;and (B2) identify genetic variants and environmental factors for preterm deliveries. In short, the objective of this project is significant, the foundation of our approach has been tested, and the new development will be novel and useful. The PI has decades of experience related to this project and leads a research center with well-established infrastructure and supporting personnel and students.

Public Health Relevance

Despite great advances in technology and methodology that have led to recent successes in identifying genetic variants for complex diseases, developments of novel statistical methods are critically important in dealing with difficulties inherent in geneic studies of complex phenotypes. This project will have a significant impact on analysis of genetic data and hence on public health, because our methods and software can help investigators understand genetic and environmental factors of common and complex diseases including substance use, cancer, and preterm birth.

National Institute of Health (NIH)
National Institute on Drug Abuse (NIDA)
Research Project (R01)
Project #
Application #
Study Section
Cardiovascular and Sleep Epidemiology (CASE)
Program Officer
Wideroff, Louise
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Zhao, Jiwei; Zhang, Heping (2016) Modeling Multiple Responses via Bootstrapping Margins with an Application to Genetic Association Testing. Stat Interface 9:47-56
Cao, Taoyun; Wang, Xueqin; Zhang, Heping (2016) Energy bagging tree. Stat Interface 9:171-181
Jiang, Yuan; He, Yunxiao; Zhang, Heping (2016) Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method. J Am Stat Assoc 111:355-376
Liu, Dungang; Liu, Regina; Xie, Minge (2015) Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness. J Am Stat Assoc 110:326-340
Ment, Laura R; Ådén, Ulrika; Bauer, Charles R et al. (2015) Genes and environment in neonatal intraventricular hemorrhage. Semin Perinatol 39:592-603
Zuo, Lingjun; Saba, Laura; Lin, Xiandong et al. (2015) Significant association between rare IPO11-HTR1A variants and attention deficit hyperactivity disorder in Caucasians. Am J Med Genet B Neuropsychiatr Genet 168:544-56
Xiao, Feifei; Min, Xiaoyi; Zhang, Heping (2015) Modified screening and ranking algorithm for copy number variation detection. Bioinformatics 31:1341-8
Gueorguieva, Ralitza; Wu, Ran; Tsai, Wan-Min et al. (2015) An analysis of moderators in the COMBINE study: Identifying subgroups of patients who benefit from acamprosate. Eur Neuropsychopharmacol 25:1586-99
Hou, Jue; Seneviratne, Chamindi; Su, Xiaogang et al. (2015) Subgroup Identification in Personalized Treatment of Alcohol Dependence. Alcohol Clin Exp Res 39:1253-9
Zhang, Heping; Baldwin, Don A; Bukowski, Radek K et al. (2015) A genome-wide association study of early spontaneous preterm delivery. Genet Epidemiol 39:217-26

Showing the most recent 10 out of 85 publications