Hundreds of genetic regions have been implicated in complex human traits in the past several years through the genome wide association study (GWAS) paradigm. Despite these successes, statistical analyses in most published work were based on single genetic markers. In addition, prior biological knowledge on genetic markers is rarely used. From both statistical and biological points of view, the rich information in the collected GWAS data has not been fully utilized to reveal disease etiologies. To address these critical needs, many research groups have been actively developing statistical and computational methods that can jointly analyze multiple markers, both within a region and across regions, and methods that can more effectively incorporate other sources of information on genetic markers, genes, and pathways in association analysis. The long- term goals of this application are to develop and implement novel statistical methods to identify genes affecting an individual's susceptibility to complex traits, to apply these methods to ongoing studies to enable more biological findings, and to disseminate these tools to the general research community. To achieve these broad goals, we propose to accomplish the following specific aims: (1) to develop statistical methods to identify markers that are informative about an individual's ancestry, and to take advantage of this information for more effective adjustment of sample heterogeneity in genetic association studies;(2) to develop statistical methods that can more efficiently perform multi-marker analysis, and to evaluate the statistical power of different marker search strategies;(3) to develop statistical methods that can systematically integrate different sources of information, especially biological pathways and networks, to increase our power to identify markers truly associated with complex diseases;(4) to develop statistical methods to use resequencing data to identify genetic associations between phenotypes and candidate regions. In addition, we will collaborate with leading human geneticists to apply and refine the statistical methods to a wide array of diseases, and to disseminate well-tested and validated programs to the scientific community.

Public Health Relevance

It is well known that genetics plays a major role in many complex human diseases, e.g. cancer, hypertension, and mental disorders. However, very few genes had been firmly implicated in these disorders until a few years ago. With the introduction of high-density platforms where hundreds of thousands of genetic variants can be monitored simultaneously and the formations of large collaborative projects where thousands of patients are jointly analyzed, the field of human genetics has enjoyed a revolution recently. Hundreds of genomic regions have been found to affect the risks of dozens of diseases, and this list will likely keep increasing in the foreseeable future. These rich data have generated many statistical challenges, especially with the rapid developments of resequencing technologies. This project will develop novel and powerful statistical methods to enable human geneticists to make the most out of the valuable data collected. Through extensive collaborations, our methods will be applied to many ongoing studies to identify more genomic regions and biological pathways for complex diseases. We will also distribute the well-tested computer programs so that other researchers can utilize the statistical tools developed by us.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Sun, Jiehuan; Herazo-Maya, Jose D; Huang, Xiu et al. (2018) Distance-correlation based gene set analysis in longitudinal studies. Stat Appl Genet Mol Biol 17:
Wang, Tao; Zhao, Hongyu (2017) A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms. Biometrics 73:792-801
Liu, Yiyi; Zhao, Hongyu (2017) Variable importance-weighted Random Forests. Quant Biol 5:338-351
Sun, Jiehuan; Herazo-Maya, Jose D; Kaminski, Naftali et al. (2017) A Dirichlet process mixture model for clustering longitudinal gene expression data. Stat Med 36:3495-3506
Yan, Xiting; Liang, Anqi; Gomez, Jose et al. (2017) A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression. BMC Bioinformatics 18:309
Chung, Dongjun; Kim, Hang J; Zhao, Hongyu (2017) graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture. PLoS Comput Biol 13:e1005388
Hou, Lin; Sun, Ning; Mane, Shrikant et al. (2017) Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study. Genet Epidemiol 41:152-162
Lin, Zhixiang; Wang, Tao; Yang, Can et al. (2017) On joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics 73:769-779
Sun, Jiehuan; Warren, Joshua L; Zhao, Hongyu (2017) A Bayesian semiparametric factor analysis model for subtype identification. Stat Appl Genet Mol Biol 16:145-158
Zhu, Ruoqing; Zhao, Ying-Qi; Chen, Guanhua et al. (2017) Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73:391-400

Showing the most recent 10 out of 190 publications