Genes underlie numerous diseases. Gene-gene and gene-environment interactions are likely to be present in most of the common, complex diseases including substance use disorders, cancer, preterm birth and its sequelae. Demonstrating such interactions beyond chance is very difficult. The Human Genome Project and HapMap Project have greatly advanced our ability to study genetic and environmental factors underlying complex diseases. In particular, high throughput genotyping technologies have been evolving rapidly. However, the etiologies of many complex diseases remain poorly understood, and use of the rich information to understand the complex diseases remains a tremendous challenge. Advanced data analysis and data mining techniques become indispensable in this endeavor. Developing powerful analytic methods to understand biological systems is the greatest challenge in using genomic information. In the last few years, several groups of investigators have successfully identified genes underlying various common, complex diseases using genome-wide association studies. Those successes have led to the NIH-wide Gene Environment Initiatives to identify genetic variants for complex. Recently, the PI has played a leading role in the planning, design, database development, statistical analysis, and study coordination for two major national networks of genetic studies using the genome-wide association study approach. This project will take advantage of the PI involvement in those two studies and build on his success in the development of statistical methods for genetic studies in the previous period. The primary aim of this application is to continue our effort and successes in developing, evaluating, and applying new statistical (both parametric and nonparametric) models, methods, and software to conduct GWA analyses of complex diseases. Specifically, we will develop (A1) statistical methods for genetic analysis of multiple traits;and (A2) tree- and forest-based models for association analyses of complex traits. Once accomplished, companion software will be developed for all of these models and made available to the public on Dr. Zhang's website. Our methods and software will be applied to the data available to the PI, and we will achieve the following secondary aims: to identify genes and environmental factors for tobacco use, substance use and its comorbidity with psychiatric disorders including anxiety. This is a continuation of our previous effort;and to identify genetic variants and environmental factors for preterm deliveries and its sequelae including Intraventricular Hemorrhage. Despite great advance in technology and methodology that led to recent successes in identifying genetic variants for complex diseases, developments of novel statistical methods are critically important to deal with difficulties inherent in genetic studies of complex phenotypes.

Public Health Relevance

This project will have significant impact in analysis of genetic data and hence public, because our methods and software can help investigators understand genetic and environmental factors of common and complex diseases including substance use, cancer, and preterm birth. That in turn leads to better prevention and treatment strategies.

Agency
National Institute of Health (NIH)
Institute
National Institute on Drug Abuse (NIDA)
Type
Research Project (R01)
Project #
5R01DA016750-07
Application #
7802148
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wideroff, Louise
Project Start
2003-07-01
Project End
2012-03-31
Budget Start
2010-04-01
Budget End
2011-03-31
Support Year
7
Fiscal Year
2010
Total Cost
$327,690
Indirect Cost
Name
Yale University
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520
Pan, Wenliang; Tian, Yuan; Wang, Xueqin et al. (2018) BALL DIVERGENCE: NONPARAMETRIC TWO SAMPLE TEST. Ann Stat 46:1109-1137
You, Na; He, Shun; Wang, Xueqin et al. (2018) Subtype classification and heterogeneous prognosis model construction in precision medicine. Biometrics 74:814-822
Liu, Dungang; Zhang, Heping (2018) Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. J Am Stat Assoc 113:845-854
Guo, Xiaobo; Zhu, Junxian; Fan, Qiao et al. (2018) A univariate perspective of multivariate genome-wide association analysis. Genet Epidemiol 42:470-479
Wen, Canhong; Mehta, Chintan M; Tan, Haizhu et al. (2018) Whole genome association study of brain-wide imaging phenotypes: A study of the ping cohort. Genet Epidemiol 42:265-275
Mehta, Chintan M; Gruen, Jeffrey R; Zhang, Heping (2017) A method for integrating neuroimaging into genetic models of learning performance. Genet Epidemiol 41:4-17
Xiao, Feifei; Niu, Yue; Hao, Ning et al. (2017) modSaRa: a computationally efficient R package for CNV identification. Bioinformatics 33:2384-2385
Bi, Xuan; Yang, Liuqing; Li, Tengfei et al. (2017) Genome-wide mediation analysis of psychiatric and cognitive traits through imaging phenotypes. Hum Brain Mapp 38:4088-4097
Song, Chi; Min, Xiaoyi; Zhang, Heping (2016) THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES. Ann Appl Stat 10:2102-2129
Cao, Taoyun; Wang, Xueqin; Zhang, Heping (2016) Energy bagging tree. Stat Interface 9:171-181

Showing the most recent 10 out of 94 publications