Hundreds of genetic regions have been implicated in complex human traits in the past several years through the genome wide association study (GWAS) paradigm. Despite these successes, statistical analyses in most published work were based on single genetic markers. In addition, prior biological knowledge on genetic markers is rarely used. From both statistical and biological points of view, the rich information in the collected GWAS data has not been fully utilized to reveal disease etiologies. To address these critical needs, many research groups have been actively developing statistical and computational methods that can jointly analyze multiple markers, both within a region and across regions, and methods that can more effectively incorporate other sources of information on genetic markers, genes, and pathways in association analysis. The long- term goals of this application are to develop and implement novel statistical methods to identify genes affecting an individual's susceptibility to complex traits, to apply these methods to ongoing studies to enable more biological findings, and to disseminate these tools to the general research community. To achieve these broad goals, we propose to accomplish the following specific aims: (1) to develop statistical methods to identify markers that are informative about an individual's ancestry, and to take advantage of this information for more effective adjustment of sample heterogeneity in genetic association studies;(2) to develop statistical methods that can more efficiently perform multi-marker analysis, and to evaluate the statistical power of different marker search strategies;(3) to develop statistical methods that can systematically integrate different sources of information, especially biological pathways and networks, to increase our power to identify markers truly associated with complex diseases;(4) to develop statistical methods to use resequencing data to identify genetic associations between phenotypes and candidate regions. In addition, we will collaborate with leading human geneticists to apply and refine the statistical methods to a wide array of diseases, and to disseminate well-tested and validated programs to the scientific community.

Public Health Relevance

It is well known that genetics plays a major role in many complex human diseases, e.g. cancer, hypertension, and mental disorders. However, very few genes had been firmly implicated in these disorders until a few years ago. With the introduction of high-density platforms where hundreds of thousands of genetic variants can be monitored simultaneously and the formations of large collaborative projects where thousands of patients are jointly analyzed, the field of human genetics has enjoyed a revolution recently. Hundreds of genomic regions have been found to affect the risks of dozens of diseases, and this list will likely keep increasing in the foreseeable future. These rich data have generated many statistical challenges, especially with the rapid developments of resequencing technologies. This project will develop novel and powerful statistical methods to enable human geneticists to make the most out of the valuable data collected. Through extensive collaborations, our methods will be applied to many ongoing studies to identify more genomic regions and biological pathways for complex diseases. We will also distribute the well-tested computer programs so that other researchers can utilize the statistical tools developed by us.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Zhu, Ruoqing; Zhao, Ying-Qi; Chen, Guanhua et al. (2017) Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73:391-400
Wang, Qian; Polimanti, Renato; Kranzler, Henry R et al. (2017) Genetic factor common to schizophrenia and HIV infection is associated with risky sexual behavior: antagonistic vs. synergistic pleiotropic SNPs enriched for distinctly different biological functions. Hum Genet 136:75-83
Yan, Xiting; Liang, Anqi; Gomez, Jose et al. (2017) A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression. BMC Bioinformatics 18:309
Sun, Jiehuan; Herazo-Maya, Jose D; Kaminski, Naftali et al. (2017) A Dirichlet process mixture model for clustering longitudinal gene expression data. Stat Med 36:3495-3506
Hou, Lin; Sun, Ning; Mane, Shrikant et al. (2017) Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study. Genet Epidemiol 41:152-162
Sun, Jiehuan; Warren, Joshua L; Zhao, Hongyu (2017) A Bayesian semiparametric factor analysis model for subtype identification. Stat Appl Genet Mol Biol 16:145-158
Dong, Kai; Zhao, Hongyu; Tong, Tiejun et al. (2016) NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinformatics 17:369
Xie, Zhongyu; Zhang, Di; Chung, Dongjun et al. (2016) Metabolic Regulation of Gene Expression by Histone Lysine ?-Hydroxybutyrylation. Mol Cell 62:194-206
Lu, Qiongshi; Jin, Chentian; Sun, Jiehuan et al. (2016) Post-GWAS Prioritization Through Data Integration Provides Novel Insights on Chronic Obstructive Pulmonary Disease. Stat Biosci 2016:1-17
Fragoso, Christopher A; Heffelfinger, Christopher; Zhao, Hongyu et al. (2016) Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data. Genetics 202:487-95

Showing the most recent 10 out of 185 publications