Population-based case-control studies with complex sampling, e.g. stratified multistage cluster sampling, are now increasingly used to study the role of genetic variants and gene-environment (G-E) interplay in the etiology of human diseases. Retrospective- based logistic regression estimators have been developed to exploit various covariate- distributional assumptions to gain efficiency in such studies when cases and controls are selected with simple random sampling. These methods, however, can lead to invalid inferences when cases or controls are selected with complex sampling. Although most single nucleotide polymorphism (SNP)-based association studies with complex sampling account for the complications induced by complex designs, many of haplotype-based genetic association studies with complex sampling tend to ignore them in the estimation of haplotype frequencies, regression coefficients or both. In this project, we will develop statistical methods for taking into account the design complications in haplotype-based association studies with complex sample designs. Specifically, attracted by the efficiency advantage of the retrospective method, we will explore the assumptions of Hardy- Weinberg equilibrium and G-E independence, and develop an efficient estimator suitable for the case-control study with a complex sample design. On the other hand, analysis with above assumptions can be misleading when these assumptions fail. Thus, the above assumptions will be relaxed by proposing an empirical Bayes-type shrinkage estimator as a trade-off between bias and efficiency. The proposed methods will be evaluated using simulations under various complex sample designs as well as two population-based case-control studies. Furthermore, a unified software package will be developed to widely disseminate the research outcomes.

Public Health Relevance

This project proposes innovative statistical methods for taking into account the design complications in haplotype-based association studies when controls, and possibly cases, are sampled with a complex sample design. The proposed methods will have general applications and results of this project will contribute to the understanding of the interplay of the genetic susceptibility and environmental risk factors, and provide an important resource for designing future population-based case-control studies.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Research Grants (R03)
Project #
5R03CA171064-02
Application #
8740470
Study Section
Special Emphasis Panel (ZCA1-SRLB-4 (J2))
Program Officer
Feuer, Eric J
Project Start
2013-09-24
Project End
2016-08-31
Budget Start
2014-09-01
Budget End
2015-08-31
Support Year
2
Fiscal Year
2014
Total Cost
$1
Indirect Cost
Name
University of Maryland College Park
Department
Social Sciences
Type
Schools of Arts and Sciences
DUNS #
790934285
City
College Park
State
MD
Country
United States
Zip Code
20742