Many health conditions, including substance use and mental illnesses, are complex and depend on both genetic and environmental factors. In the past several years genome wide association studies (GWA) have identified single-nucleotide polymorphisms implicating hundreds of robustly replicated loci for common traits. Despite numerous successes, it remains persistently difficult to identify genes, environmental factors, and interactions among them for complex diseases. This has been referred to as the geneticist's nightmare. Most of the identified variants have low associated risks and account for little heritability, and there is an increasing attention to find the "missing heritability" of complex diseases. To this end, it is important to develop novel statistical methods. Our Preliminary Progress demonstrates that our proposed methods have already produced significant findings on the association between genes, environments, and complex traits. Several genetic variants that we identified by our novel methods will be cataloged by National Human Genome Research Institute. This project will take advantage of the PI's many years of experience in the data collection and analysis of GWA studies and build on his success in the development of statistical methods and software for genetic studies. The primary aim of this application is to continue our effort and success in developing, evaluating, and applying new statistical models, methods, and software to conduct GWA analyses of complex diseases.
Our specific aims are as follows: (A1) to develop statistical methods to perform inference for multidimensional and multi-modal traits. New methods will be developed to find the hidden heritability by incorporating multiple variants;simultaneously considering genetics and environment, and modeling multiple and heterogeneous traits;(A2) to develop tree- and forest-based methods for association analyses by incorporating multiple genetic variants, covariates, and gene-covariate interactions and incorporating existing biological information;(A.3) to develop and release software for public use through the PI's website. While the methods and software are developed, they will be applied to a variety of real studies that will serve as motivation and validation of our methods and software. In this regard, our secondary aims are to (B1) identify genes and environmental factors for addiction, mental illnesses, and the co-morbidity of psychiatric disorders;and (B2) identify genetic variants and environmental factors for preterm deliveries. In short, the objective of this project is significant, the foundation of our approach has been tested, and the new development will be novel and useful. The PI has decades of experience related to this project and leads a research center with well-established infrastructure and supporting personnel and students.

Public Health Relevance

Despite great advances in technology and methodology that have led to recent successes in identifying genetic variants for complex diseases, developments of novel statistical methods are critically important in dealing with difficulties inherent in geneic studies of complex phenotypes. This project will have a significant impact on analysis of genetic data and hence on public health, because our methods and software can help investigators understand genetic and environmental factors of common and complex diseases including substance use, cancer, and preterm birth.

National Institute of Health (NIH)
National Institute on Drug Abuse (NIDA)
Research Project (R01)
Project #
Application #
Study Section
Cardiovascular and Sleep Epidemiology (CASE)
Program Officer
Pollock, Jonathan D
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Song, Chi; Zhang, Heping (2014) TARV: tree-based analysis of rare variants identifying risk modifying variants in CTNNA2 and CNTNAP2 for alcohol addiction. Genet Epidemiol 38:552-9
Zuo, Lingjun; Wang, Kesheng; Wang, Guilin et al. (2014) Common PTP4A1-PHF3-EYS variants are specific for alcohol dependence. Am J Addict 23:411-4
Yang, Guang; Liu, Dungang; Liu, Regina Y et al. (2014) Efficient network meta-analysis: a confidence distribution approach. Stat Methodol 20:105-125
Xiao, Feifei; Ma, Jianzhong; Cai, Guoshuai et al. (2014) Natural and orthogonal model for estimating gene-gene interactions applied to cutaneous melanoma. Hum Genet 133:559-74
Jiang, Yuan; Li, Ni; Zhang, Heping (2014) Identifying Genetic Variants for Addiction via Propensity Score Adjusted Generalized Kendall's Tau. J Am Stat Assoc 109:905-930
Tan, H; Zhang, H; Xie, J et al. (2014) A novel staging model to classify oesophageal squamous cell carcinoma patients in China. Br J Cancer 110:2109-15
Ment, Laura R; Aden, Ulrika; Lin, Aiping et al. (2014) Gene-environment interactions in severe intraventricular hemorrhage of preterm neonates. Pediatr Res 75:241-50
Zuo, Lingjun; Zhang, Heping; Malison, Robert T et al. (2013) Rare ADH variant constellations are specific for alcohol dependence. Alcohol Alcohol 48:9-14
Xu, Yaji; Wu, Yinghua; Song, Chi et al. (2013) Simulating realistic genomic data with rare variants. Genet Epidemiol 37:163-72

Showing the most recent 10 out of 57 publications