The long-term objective of this research is to develop powerful statistical methods for the analysis of data from genetic epidemiology studies. While voluminous data are becoming available owing to the Human Genome Project and rapid advancement of high throughput genotyping technology, powerful statistical methods are needed for ultimate success in identifying predisposing genetic variants and their environmental modifiers. This project focuses on developing statistical methods for analyzing genetic association studies on perinatal or early-life diseases. These studies very often adopt a retrospective case-control design, but they have a distinct feature in that offspring of mother cases/controls (for perinatal diseases) or parents of offspring cases/controls (for early-life diseases) are also recruited. Thus these studies have information on both unrelated case-control comparisons and genotype/haplotype transmissions within families. Another important feature of these studies is that the covariate distribution in the study population is structured so that genetic and environmental variables are usually independent within families. The fact that such independence does not hold in the case population under the alternative hypothesis provides further information on the association beyond standard case-control comparison. These studies usually seek to evaluate effects of both maternal and offspring genotypes/haplotypes, their interactions, and gene-environment interactions. Building on currently available approaches for analysis of case-control association studies and case-parent triads, we propose novel efficient estimation and testing methods that can account for the retrospective case-control design and incorporate the family information on the genotype/haplotype transmission and the structure in the covariate distribution. Classical logistic regression for case-control studies applies for most of the analysis but is less efficient due to the ignorance of family information and covariate structure. The Transmission/Disequilibrium type test or likelihood-based methods for analyzing case-parent triads discard the controls and/or their parents and cannot estimate all parameters of interest (e.g., main effects of environmental exposures). Our methods range from profile-likelihood methods and estimating-function based methods to hybrid methods based on the conditional likelihood for case triads and pseudo-likelihoods. This project is motivated by and will be applied to ongoing scientific studies at the University of Pennsylvania on which the PI is collaborating, and the phenotypes include pre-term birth, preeclampsia, hypospadias, and asthma. Our methods also have broad implications to the study of phenotypes other than perinatal and early-life diseases. We will develop large sample theories for the proposed methods, evaluate their finite sample performance by simulation studies, and demonstrate their usefulness using real data. Fully documented software to implement these methods for public use will be provided using freely available statistical package R.
Showing the most recent 10 out of 11 publications