The long-term objective of this research is to develop powerful statistical methods for the analysis of data from genetic epidemiology studies. While voluminous data are becoming available owing to the Human Genome Project and rapid advancement of high throughput genotyping technology, powerful statistical methods are needed for ultimate success in identifying predisposing genetic variants and their environmental modifiers. This project focuses on developing statistical methods for analyzing genetic association studies on perinatal or early-life diseases. These studies very often adopt a retrospective case-control design, but they have a distinct feature in that offspring of mother cases/controls (for perinatal diseases) or parents of offspring cases/controls (for early-life diseases) are also recruited. Thus these studies have information on both unrelated case-control comparisons and genotype/haplotype transmissions within families. Another important feature of these studies is that the covariate distribution in the study population is structured so that genetic and environmental variables are usually independent within families. The fact that such independence does not hold in the case population under the alternative hypothesis provides further information on the association beyond standard case-control comparison. These studies usually seek to evaluate effects of both maternal and offspring genotypes/haplotypes, their interactions, and gene-environment interactions. Building on currently available approaches for analysis of case-control association studies and case-parent triads, we propose novel efficient estimation and testing methods that can account for the retrospective case-control design and incorporate the family information on the genotype/haplotype transmission and the structure in the covariate distribution. Classical logistic regression for case-control studies applies for most of the analysis but is less efficient due to the ignorance of family information and covariate structure. The Transmission/Disequilibrium type test or likelihood-based methods for analyzing case-parent triads discard the controls and/or their parents and cannot estimate all parameters of interest (e.g., main effects of environmental exposures). Our methods range from profile-likelihood methods and estimating-function based methods to hybrid methods based on the conditional likelihood for case triads and pseudo-likelihoods. This project is motivated by and will be applied to ongoing scientific studies at the University of Pennsylvania on which the PI is collaborating, and the phenotypes include pre-term birth, preeclampsia, hypospadias, and asthma. Our methods also have broad implications to the study of phenotypes other than perinatal and early-life diseases. We will develop large sample theories for the proposed methods, evaluate their finite sample performance by simulation studies, and demonstrate their usefulness using real data. Fully documented software to implement these methods for public use will be provided using freely available statistical package R.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Research Project (R01)
Project #
5R01ES016626-05
Application #
8257883
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Mcallister, Kimberly A
Project Start
2008-07-01
Project End
2014-03-31
Budget Start
2012-04-01
Budget End
2014-03-31
Support Year
5
Fiscal Year
2012
Total Cost
$315,386
Indirect Cost
$99,666
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Wang, Lu; Damrauer, Scott M; Zhang, Hong et al. (2017) Phenotype validation in electronic health records based genetic association studies. Genet Epidemiol 41:790-800
Chen, Lu; Weinberg, Clarice R; Chen, Jinbo (2016) Using family members to augment genetic case-control studies of a life-threatening disease. Stat Med 35:2815-30
Li, Huilin; Chen, Jinbo (2016) Efficient unified rare variant association test by modeling the population genetic distribution in case-control studies. Genet Epidemiol 40:579-590
Yu, Kai; Zhang, Han; Wheeler, William et al. (2015) A robust association test for detecting genetic variants with heterogeneous effects. Biostatistics 16:5-16
Shen, Yuanyuan; Cai, Tianxi; Chen, Yu et al. (2015) Retrospective likelihood-based methods for analyzing case-cohort genetic association studies. Biometrics 71:960-8
Kang, Guolian; Lin, Dongyu; Hakonarson, Hakon et al. (2012) Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power. Hum Hered 73:139-47
Chen, Jinbo; Kang, Guolian; Vanderweele, Tyler et al. (2012) Efficient designs of gene-environment interaction studies: implications of Hardy-Weinberg equilibrium and gene-environment independence. Stat Med 31:2516-30
Chen, Jinbo; Lin, Dongyu; Hochner, Hagit (2012) Semiparametric maximum likelihood methods for analyzing genetic and environmental effects with case-control mother-child pair data. Biometrics 68:869-77
Li, Yan; Li, Zhaohai; Graubard, Barry I (2011) Testing for Hardy Weinberg Equilibrium in national household surveys that collect family-based genetic data. Ann Hum Genet 75:732-41
Chen, Hua Yun; Chen, Jinbo (2011) On information coded in gene-environment independence in case-control studies. Am J Epidemiol 174:736-43

Showing the most recent 10 out of 11 publications