The long-term objective of this research is to develop powerful statistical methods for the analysis of data from genetic epidemiology studies. While voluminous data are becoming available owing to the Human Genome Project and rapid advancement of high throughput genotyping technology, powerful statistical methods are needed for ultimate success in identifying predisposing genetic variants and their environmental modifiers. This project focuses on developing statistical methods for analyzing genetic association studies on perinatal or early-life diseases. These studies very often adopt a retrospective case-control design, but they have a distinct feature in that offspring of mother cases/controls (for perinatal diseases) or parents of offspring cases/controls (for early-life diseases) are also recruited. Thus these studies have information on both unrelated case-control comparisons and genotype/haplotype transmissions within families. Another important feature of these studies is that the covariate distribution in the study population is structured so that genetic and environmental variables are usually independent within families. The fact that such independence does not hold in the case population under the alternative hypothesis provides further information on the association beyond standard case-control comparison. These studies usually seek to evaluate effects of both maternal and offspring genotypes/haplotypes, their interactions, and gene-environment interactions. Building on currently available approaches for analysis of case-control association studies and case-parent triads, we propose novel efficient estimation and testing methods that can account for the retrospective case-control design and incorporate the family information on the genotype/haplotype transmission and the structure in the covariate distribution. Classical logistic regression for case-control studies applies for most of the analysis but is less efficient due to the ignorance of family information and covariate structure. The Transmission/Disequilibrium type test or likelihood-based methods for analyzing case-parent triads discard the controls and/or their parents and cannot estimate all parameters of interest (e.g., main effects of environmental exposures). Our methods range from profile-likelihood methods and estimating-function based methods to hybrid methods based on the conditional likelihood for case triads and pseudo-likelihoods. This project is motivated by and will be applied to ongoing scientific studies at the University of Pennsylvania on which the PI is collaborating, and the phenotypes include pre-term birth, preeclampsia, hypospadias, and asthma. Our methods also have broad implications to the study of phenotypes other than perinatal and early-life diseases. We will develop large sample theories for the proposed methods, evaluate their finite sample performance by simulation studies, and demonstrate their usefulness using real data. Fully documented software to implement these methods for public use will be provided using freely available statistical package R.

Public Health Relevance

This project proposes novel statistical methods for the analysis of data arising from case- control genetic epidemiology studies of perinatal or early childhood diseases. Data usually consist of case and control mothers and their respective offspring or consist of both case-parent triads and control-parent triads. The proposed methods are for the estimation and testing of maternal and offspring genotype/haplotype main effects and interactions and interaction effects between genotypes/haplotypes and environment variables.

National Institute of Health (NIH)
National Institute of Environmental Health Sciences (NIEHS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Mcallister, Kimberly A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Chen, Lu; Weinberg, Clarice R; Chen, Jinbo (2016) Using family members to augment genetic case-control studies of a life-threatening disease. Stat Med 35:2815-30
Li, Huilin; Chen, Jinbo (2016) Efficient unified rare variant association test by modeling the population genetic distribution in case-control studies. Genet Epidemiol 40:579-590
Yu, Kai; Zhang, Han; Wheeler, William et al. (2015) A robust association test for detecting genetic variants with heterogeneous effects. Biostatistics 16:5-16
Shen, Yuanyuan; Cai, Tianxi; Chen, Yu et al. (2015) Retrospective likelihood-based methods for analyzing case-cohort genetic association studies. Biometrics 71:960-8
Kang, Guolian; Lin, Dongyu; Hakonarson, Hakon et al. (2012) Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power. Hum Hered 73:139-47
Chen, Jinbo; Kang, Guolian; Vanderweele, Tyler et al. (2012) Efficient designs of gene-environment interaction studies: implications of Hardy-Weinberg equilibrium and gene-environment independence. Stat Med 31:2516-30
Chen, Jinbo; Lin, Dongyu; Hochner, Hagit (2012) Semiparametric maximum likelihood methods for analyzing genetic and environmental effects with case-control mother-child pair data. Biometrics 68:869-77
Li, Yan; Li, Zhaohai; Graubard, Barry I (2011) Testing for Hardy Weinberg Equilibrium in national household surveys that collect family-based genetic data. Ann Hum Genet 75:732-41
Feng, Rui; Wu, Yinghua; Jang, Gun Ho et al. (2011) A powerful test of parent-of-origin effects for quantitative traits using haplotypes. PLoS One 6:e28909
Chen, Hua Yun; Chen, Jinbo (2011) On information coded in gene-environment independence in case-control studies. Am J Epidemiol 174:736-43