The long-term objective of this research is to develop powerful statistical methods for the analysis of data from genetic epidemiology studies. While voluminous data are becoming available owing to the Human Genome Project and rapid advancement of high throughput genotyping technology, powerful statistical methods are needed for ultimate success in identifying predisposing genetic variants and their environmental modifiers. This project focuses on developing statistical methods for analyzing genetic association studies on perinatal or early-life diseases. These studies very often adopt a retrospective case-control design, but they have a distinct feature in that offspring of mother cases/controls (for perinatal diseases) or parents of offspring cases/controls (for early-life diseases) are also recruited. Thus these studies have information on both unrelated case-control comparisons and genotype/haplotype transmissions within families. Another important feature of these studies is that the covariate distribution in the study population is structured so that genetic and environmental variables are usually independent within families. The fact that such independence does not hold in the case population under the alternative hypothesis provides further information on the association beyond standard case-control comparison. These studies usually seek to evaluate effects of both maternal and offspring genotypes/haplotypes, their interactions, and gene-environment interactions. Building on currently available approaches for analysis of case-control association studies and case-parent triads, we propose novel efficient estimation and testing methods that can account for the retrospective case-control design and incorporate the family information on the genotype/haplotype transmission and the structure in the covariate distribution. Classical logistic regression for case-control studies applies for most of the analysis but is less efficient due to the ignorance of family information and covariate structure. The Transmission/Disequilibrium type test or likelihood-based methods for analyzing case-parent triads discard the controls and/or their parents and cannot estimate all parameters of interest (e.g., main effects of environmental exposures). Our methods range from profile-likelihood methods and estimating-function based methods to hybrid methods based on the conditional likelihood for case triads and pseudo-likelihoods. This project is motivated by and will be applied to ongoing scientific studies at the University of Pennsylvania on which the PI is collaborating, and the phenotypes include pre-term birth, preeclampsia, hypospadias, and asthma. Our methods also have broad implications to the study of phenotypes other than perinatal and early-life diseases. We will develop large sample theories for the proposed methods, evaluate their finite sample performance by simulation studies, and demonstrate their usefulness using real data. Fully documented software to implement these methods for public use will be provided using freely available statistical package R.

National Institute of Health (NIH)
National Institute of Environmental Health Sciences (NIEHS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Mcallister, Kimberly A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Li, Huilin; Chen, Jinbo (2016) Efficient unified rare variant association test by modeling the population genetic distribution in case-control studies. Genet Epidemiol 40:579-590
Chen, Lu; Weinberg, Clarice R; Chen, Jinbo (2016) Using family members to augment genetic case-control studies of a life-threatening disease. Stat Med 35:2815-30
Shen, Yuanyuan; Cai, Tianxi; Chen, Yu et al. (2015) Retrospective likelihood-based methods for analyzing case-cohort genetic association studies. Biometrics 71:960-8
Yu, Kai; Zhang, Han; Wheeler, William et al. (2015) A robust association test for detecting genetic variants with heterogeneous effects. Biostatistics 16:5-16
Lin, Dongyu; Weinberg, Clarice R; Feng, Rui et al. (2013) A multi-locus likelihood method for assessing parent-of-origin effects using case-control mother-child pairs. Genet Epidemiol 37:152-62
Kang, Guolian; Lin, Dongyu; Hakonarson, Hakon et al. (2012) Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power. Hum Hered 73:139-47
Chen, Jinbo; Kang, Guolian; Vanderweele, Tyler et al. (2012) Efficient designs of gene-environment interaction studies: implications of Hardy-Weinberg equilibrium and gene-environment independence. Stat Med 31:2516-30
Chen, Jinbo; Lin, Dongyu; Hochner, Hagit (2012) Semiparametric maximum likelihood methods for analyzing genetic and environmental effects with case-control mother-child pair data. Biometrics 68:869-77
Chen, Hua Yun; Chen, Jinbo (2011) On information coded in gene-environment independence in case-control studies. Am J Epidemiol 174:736-43
Feng, Rui; Wu, Yinghua; Jang, Gun Ho et al. (2011) A powerful test of parent-of-origin effects for quantitative traits using haplotypes. PLoS One 6:e28909

Showing the most recent 10 out of 11 publications