In case-control studies of gene-environment association with disease, when genetic and environmental exposures can be assumed to be independent in the underlying population, one may exploit the independence assumption in order to derive more efficient estimation techniques than the traditional logistic regression analysis. Many of the classical results for case-control analysis, which assume the covariate distribution to be non-parametric, do not hold under a constrained space of exposure distributions. However, the gain in efficiency of modern retrospective methods comes at the cost of lack of robustness, since large biases are introduced in the retrospective estimates under violation of the gene-environment independence assumption. The main goal of this research proposal is to find natural analytical tools to solve the model specification dilemma of modern retrospective analysis of studies of gene-environment interaction, under some commonly used epidemiological designs. Using the profile-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika), the investigator proposes a Bayesian approach that incorporates uncertainty regarding the assumed constraint of gene-environment independence in a natural data adaptive way. The proposed shrinkage estimator, conceived from a Bayesian standpoint, is designed to maintain attractive efficiency properties, without relying on unverifiable model constraints. Theoretical properties of the proposed estimator are studied under varying scenarios of gene-environment association. The investigator considers both empirical Bayes and hierarchical Bayes methods to relax gene-environment independence assumption. The proposal explores the connection of the Bayesian approaches to an alternative random-effects model. The methods are extended beyond the commonly used unmatched case-control study design to two-phase and family-based studies of gene-environment interaction.

Two scientific streams are currently playing extremely important roles in clinical medicine and public health: the molecular biology approach with an emphasis on genetics, and the quantitative approach with an emphasis on epidemiology. The developments in these areas jointly are making fundamental contributions to the study of etiology, diagnosis, prognosis and treatment of complex diseases. Phenomenal advancement of medical science and genetic technology is giving rise to many complex design and analysis issues which statisticians and epidemiologists have never confronted before. This proposal lies in that new interface of human genetics, epidemiology and statistics. Case-control studies are being increasingly used for studying the association between a disease and a candidate gene. However, except for some rare diseases, such as Huntington or Tay Sachs disease which may be the result of a deficiency of a single gene product, most common human diseases have a multifactorial etiology involving complex interplay of many genetic and environmental factors. By identifying and characterizing such complicated gene-environment interactions through clinical and epidemiological studies, one has more opportunities to understand the genesis and etiology of complex diseases and to develop targeted intervention strategies for high-risk individuals. The proposal presents robust and efficient statistical techniques to investigate the synergism between gene and environment in studying complex diseases. The high-performance computing tools developed in the proposal makes it feasible to use the methods in large-scale applications such as genome-wide association studies.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0706935
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2007-06-01
Budget End
2010-05-31
Support Year
Fiscal Year
2007
Total Cost
$134,451
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109