Population-based case-control studies with complex sampling, e.g. stratified multistage cluster sampling, are now increasingly used to study the role of genetic variants and gene-environment (G-E) interplay in the etiology of human diseases. Retrospective- based logistic regression estimators have been developed to exploit various covariate- distributional assumptions to gain efficiency in such studies when cases and controls are selected with simple random sampling. These methods, however, can lead to invalid inferences when cases or controls are selected with complex sampling. Although most single nucleotide polymorphism (SNP)-based association studies with complex sampling account for the complications induced by complex designs, many of haplotype-based genetic association studies with complex sampling tend to ignore them in the estimation of haplotype frequencies, regression coefficients or both. In this project, we will develop statistical methods for taking into account the design complications in haplotype-based association studies with complex sample designs. Specifically, attracted by the efficiency advantage of the retrospective method, we will explore the assumptions of Hardy- Weinberg equilibrium and G-E independence, and develop an efficient estimator suitable for the case-control study with a complex sample design. On the other hand, analysis with above assumptions can be misleading when these assumptions fail. Thus, the above assumptions will be relaxed by proposing an empirical Bayes-type shrinkage estimator as a trade-off between bias and efficiency. The proposed methods will be evaluated using simulations under various complex sample designs as well as two population-based case-control studies. Furthermore, a unified software package will be developed to widely disseminate the research outcomes.
This project proposes innovative statistical methods for taking into account the design complications in haplotype-based association studies when controls, and possibly cases, are sampled with a complex sample design. The proposed methods will have general applications and results of this project will contribute to the understanding of the interplay of the genetic susceptibility and environmental risk factors, and provide an important resource for designing future population-based case-control studies.