Hundreds of genetic regions have been implicated in complex human traits in the past several years through the genome wide association study (GWAS) paradigm. Despite these successes, statistical analyses in most published work were based on single genetic markers. In addition, prior biological knowledge on genetic markers is rarely used. From both statistical and biological points of view, the rich information in the collected GWAS data has not been fully utilized to reveal disease etiologies. To address these critical needs, many research groups have been actively developing statistical and computational methods that can jointly analyze multiple markers, both within a region and across regions, and methods that can more effectively incorporate other sources of information on genetic markers, genes, and pathways in association analysis. The long- term goals of this application are to develop and implement novel statistical methods to identify genes affecting an individual's susceptibility to complex traits, to apply these methods to ongoing studies to enable more biological findings, and to disseminate these tools to the general research community. To achieve these broad goals, we propose to accomplish the following specific aims: (1) to develop statistical methods to identify markers that are informative about an individual's ancestry, and to take advantage of this information for more effective adjustment of sample heterogeneity in genetic association studies;(2) to develop statistical methods that can more efficiently perform multi-marker analysis, and to evaluate the statistical power of different marker search strategies;(3) to develop statistical methods that can systematically integrate different sources of information, especially biological pathways and networks, to increase our power to identify markers truly associated with complex diseases;(4) to develop statistical methods to use resequencing data to identify genetic associations between phenotypes and candidate regions. In addition, we will collaborate with leading human geneticists to apply and refine the statistical methods to a wide array of diseases, and to disseminate well-tested and validated programs to the scientific community.

Public Health Relevance

It is well known that genetics plays a major role in many complex human diseases, e.g. cancer, hypertension, and mental disorders. However, very few genes had been firmly implicated in these disorders until a few years ago. With the introduction of high-density platforms where hundreds of thousands of genetic variants can be monitored simultaneously and the formations of large collaborative projects where thousands of patients are jointly analyzed, the field of human genetics has enjoyed a revolution recently. Hundreds of genomic regions have been found to affect the risks of dozens of diseases, and this list will likely keep increasing in the foreseeable future. These rich data have generated many statistical challenges, especially with the rapid developments of resequencing technologies. This project will develop novel and powerful statistical methods to enable human geneticists to make the most out of the valuable data collected. Through extensive collaborations, our methods will be applied to many ongoing studies to identify more genomic regions and biological pathways for complex diseases. We will also distribute the well-tested computer programs so that other researchers can utilize the statistical tools developed by us.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Lin, Zhixiang; Li, Mingfeng; Sestan, Nenad et al. (2016) A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data. Stat Appl Genet Mol Biol 15:139-50
Wang, Tao; Chen, Mengjie; Zhao, Hongyu (2016) Estimating DNA methylation levels by joint modeling of multiple methylation profiles from microarray data. Biometrics 72:354-63
Lu, Qiongshi; Jin, Chentian; Sun, Jiehuan et al. (2016) Post-GWAS Prioritization Through Data Integration Provides Novel Insights on Chronic Obstructive Pulmonary Disease. Stat Biosci 2016:1-17
Ryslik, Gregory A; Cheng, Yuwei; Modis, Yorgo et al. (2016) Leveraging protein quaternary structure to identify oncogenic driver mutations. BMC Bioinformatics 17:137
Lu, Qiongshi; Yao, Xinwei; Hu, Yiming et al. (2016) GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation. Bioinformatics 32:542-8
Fragoso, Christopher A; Heffelfinger, Christopher; Zhao, Hongyu et al. (2016) Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data. Genetics 202:487-95
Huang, Xiu; Stern, David F; Zhao, Hongyu (2016) Transcriptional Profiles from Paired Normal Samples Offer Complementary Information on Cancer Patient Survival--Evidence from TCGA Pan-Cancer Data. Sci Rep 6:20567
Hu, Yiming; Zhao, Hongyu (2016) CCor: A whole genome network-based similarity measure between two genes. Biometrics 72:1216-1225
Chen, Mengjie; Gao, Chao; Zhao, Hongyu (2016) Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes. Bayesian Anal 11:477-497
Lu, Qiongshi; Powles, Ryan Lee; Wang, Qian et al. (2016) Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies. PLoS Genet 12:e1005947

Showing the most recent 10 out of 173 publications