Hundreds of genetic regions have been implicated in complex human traits in the past several years through the genome wide association study (GWAS) paradigm. Despite these successes, statistical analyses in most published work were based on single genetic markers. In addition, prior biological knowledge on genetic markers is rarely used. From both statistical and biological points of view, the rich information in the collected GWAS data has not been fully utilized to reveal disease etiologies. To address these critical needs, many research groups have been actively developing statistical and computational methods that can jointly analyze multiple markers, both within a region and across regions, and methods that can more effectively incorporate other sources of information on genetic markers, genes, and pathways in association analysis. The long- term goals of this application are to develop and implement novel statistical methods to identify genes affecting an individual's susceptibility to complex traits, to apply these methods to ongoing studies to enable more biological findings, and to disseminate these tools to the general research community. To achieve these broad goals, we propose to accomplish the following specific aims: (1) to develop statistical methods to identify markers that are informative about an individual's ancestry, and to take advantage of this information for more effective adjustment of sample heterogeneity in genetic association studies;(2) to develop statistical methods that can more efficiently perform multi-marker analysis, and to evaluate the statistical power of different marker search strategies;(3) to develop statistical methods that can systematically integrate different sources of information, especially biological pathways and networks, to increase our power to identify markers truly associated with complex diseases;(4) to develop statistical methods to use resequencing data to identify genetic associations between phenotypes and candidate regions. In addition, we will collaborate with leading human geneticists to apply and refine the statistical methods to a wide array of diseases, and to disseminate well-tested and validated programs to the scientific community.

Public Health Relevance

It is well known that genetics plays a major role in many complex human diseases, e.g. cancer, hypertension, and mental disorders. However, very few genes had been firmly implicated in these disorders until a few years ago. With the introduction of high-density platforms where hundreds of thousands of genetic variants can be monitored simultaneously and the formations of large collaborative projects where thousands of patients are jointly analyzed, the field of human genetics has enjoyed a revolution recently. Hundreds of genomic regions have been found to affect the risks of dozens of diseases, and this list will likely keep increasing in the foreseeable future. These rich data have generated many statistical challenges, especially with the rapid developments of resequencing technologies. This project will develop novel and powerful statistical methods to enable human geneticists to make the most out of the valuable data collected. Through extensive collaborations, our methods will be applied to many ongoing studies to identify more genomic regions and biological pathways for complex diseases. We will also distribute the well-tested computer programs so that other researchers can utilize the statistical tools developed by us.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-F (02))
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Yang, Can; Li, Cong; Kranzler, Henry R et al. (2014) Exploring the genetic architecture of alcohol dependence in African-Americans via analysis of a genomewide set of common variants. Hum Genet 133:617-24
Hou, Lin; Ma, TianZhou; Zhao, HongYu (2014) Incorporating functional annotation information in prioritizing disease associated SNPs from genome wide association studies. Sci China Life Sci 57:1072-9
Chung, Dongjun; Yang, Can; Li, Cong et al. (2014) GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet 10:e1004787
Heffelfinger, Christopher; Fragoso, Christopher A; Moreno, Maria A et al. (2014) Flexible and scalable genotyping-by-sequencing strategies for population studies. BMC Genomics 15:979
Ryslik, Gregory A; Cheng, Yuwei; Cheung, Kei-Hoi et al. (2014) A spatial simulation approach to account for protein structure when identifying non-random somatic mutations. BMC Bioinformatics 15:231
Li, Cong; Yang, Can; Gelernter, Joel et al. (2014) Improving genetic risk prediction by leveraging pleiotropy. Hum Genet 133:639-50
Ryslik, Gregory A; Cheng, Yuwei; Cheung, Kei-Hoi et al. (2014) A graph theoretic approach to utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 15:86
Hou, Lin; Chen, Min; Zhang, Clarence K et al. (2014) Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum Mol Genet 23:2780-90
Tong, Tiejun; Feng, Zeny; Hilton, Julia S et al. (2013) Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values. J Appl Stat 40:1949-1964
Kim, Inyoung; Pang, Herbert; Zhao, Hongyu (2013) Statistical properties on semiparametric regression for evaluating pathway effects. J Stat Plan Inference 143:745-763

Showing the most recent 10 out of 132 publications